Skip to content

[Routing] Handle very large set of dynamic routes #26169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 14, 2018

Conversation

nicolas-grekas
Copy link
Member

@nicolas-grekas nicolas-grekas commented Feb 13, 2018

Q A
Branch? master
Bug fix? yes
New feature? no
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets -
License MIT
Doc PR -

Follow up of #26059; allows handling very long lists of dynamic routes.
Allows running https://github.com/tyler-sommer/php-router-benchmark seamlessly.

BTW, here are the result of running this benchmark:

R3 extension is not loaded. Skipping initialization for "Worst-case matching" test using R3.
R3 extension is not loaded. Skipping initialization for "First route matching" test using R3.

Worst-case matching

This benchmark matches the last route and unknown route. It generates a randomly prefixed and suffixed route in an attempt to thwart any optimization. 1,000 routes each with 9 arguments.

This benchmark consists of 10 tests. Each test is executed 1,000 times, the results pruned, and then averaged. Values that fall outside of 3 standard deviations of the mean are discarded.

Test Name Results Time + Interval Change
Symfony4 - unknown route (1000 routes) 995 0.0000085699 +0.0000000000 baseline
Symfony4 - last route (1000 routes) 999 0.0000086754 +0.0000001055 1% slower
FastRoute - unknown route (1000 routes) 980 0.0000305154 +0.0000219455 256% slower
FastRoute - last route (1000 routes) 999 0.0000529922 +0.0000444223 518% slower
Pux PHP - unknown route (1000 routes) 972 0.0003162730 +0.0003077032 3591% slower
Pux PHP - last route (1000 routes) 999 0.0004376847 +0.0004291148 5007% slower
Aura v2 - unknown route (1000 routes) 976 0.0138277517 +0.0138191818 161253% slower
Aura v2 - last route (1000 routes) 989 0.0138914190 +0.0138828491 161996% slower

First route matching

This benchmark tests how quickly each router can match the first route. 1,000 routes each with 9 arguments.

This benchmark consists of 5 tests. Each test is executed 1,000 times, the results pruned, and then averaged. Values that fall outside of 3 standard deviations of the mean are discarded.

Test Name Results Time + Interval Change
FastRoute - first route 999 0.0000016928 +0.0000000000 baseline
Pux PHP - first route 999 0.0000017381 +0.0000000453 3% slower
Symfony4 - first route 997 0.0000029818 +0.0000012890 76% slower
Aura v2 - first route 977 0.0000376436 +0.0000359508 2124% slower

$code .= "\n .'{$rx}',";
$state->regex .= $rx;

// if the regex is too large, throw a signaling exception to recompute with smaller chunk size
Copy link
Contributor

@ro0NL ro0NL Feb 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious... why not have (reasonably large) fixed limit instead? is it variable in case of PCRE? It could avoid any recomputing needed no?

edit: wait this is compile info.. right? It gives the best chunk limit possible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I have absolutely no idea what would that mean in practice, and the current logic saves me from answering this question :)

Copy link
Contributor

@ro0NL ro0NL Feb 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The large fixture is a result of this choice i guess.. in theory that could break tests in different envs.. 🤔 should it be mocked?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we should care about such concerns.

@nicolas-grekas nicolas-grekas merged commit ee8b201 into symfony:master Feb 14, 2018
nicolas-grekas added a commit that referenced this pull request Feb 14, 2018
…-grekas)

This PR was merged into the 4.1-dev branch.

Discussion
----------

[Routing] Handle very large set of dynamic routes

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | -
| License       | MIT
| Doc PR        | -

Follow up of #26059; allows handling very long lists of dynamic routes.
Allows running https://github.com/tyler-sommer/php-router-benchmark seamlessly.

BTW, here are the result of running this benchmark:

R3 extension is not loaded. Skipping initialization for "Worst-case matching" test using R3.
R3 extension is not loaded. Skipping initialization for "First route matching" test using R3.
## Worst-case matching
This benchmark matches the last route and unknown route. It generates a randomly prefixed and suffixed route in an attempt to thwart any optimization. 1,000 routes each with 9 arguments.

This benchmark consists of 10 tests. Each test is executed 1,000 times, the results pruned, and then averaged. Values that fall outside of 3 standard deviations of the mean are discarded.

Test Name | Results | Time | + Interval | Change
--------- | ------- | ---- | ---------- | ------
Symfony4 - unknown route (1000 routes) | 995 | 0.0000085699 | +0.0000000000 | baseline
Symfony4 - last route (1000 routes) | 999 | 0.0000086754 | +0.0000001055 | 1% slower
FastRoute - unknown route (1000 routes) | 980 | 0.0000305154 | +0.0000219455 | 256% slower
FastRoute - last route (1000 routes) | 999 | 0.0000529922 | +0.0000444223 | 518% slower
Pux PHP - unknown route (1000 routes) | 972 | 0.0003162730 | +0.0003077032 | 3591% slower
Pux PHP - last route (1000 routes) | 999 | 0.0004376847 | +0.0004291148 | 5007% slower
Aura v2 - unknown route (1000 routes) | 976 | 0.0138277517 | +0.0138191818 | 161253% slower
Aura v2 - last route (1000 routes) | 989 | 0.0138914190 | +0.0138828491 | 161996% slower

## First route matching
This benchmark tests how quickly each router can match the first route. 1,000 routes each with 9 arguments.

This benchmark consists of 5 tests. Each test is executed 1,000 times, the results pruned, and then averaged. Values that fall outside of 3 standard deviations of the mean are discarded.

Test Name | Results | Time | + Interval | Change
--------- | ------- | ---- | ---------- | ------
FastRoute - first route | 999 | 0.0000016928 | +0.0000000000 | baseline
Pux PHP - first route | 999 | 0.0000017381 | +0.0000000453 | 3% slower
Symfony4 - first route | 997 | 0.0000029818 | +0.0000012890 | 76% slower
Aura v2 - first route | 977 | 0.0000376436 | +0.0000359508 | 2124% slower

Commits
-------

ee8b201 [Routing] Handle very large set of dynamic routes
@nicolas-grekas nicolas-grekas deleted the router-chunk branch February 14, 2018 12:08
@fredemmott
Copy link

fredemmott commented Feb 16, 2018

@nicolas-grekas: you can't use it directly as it's not PHP, but you might want to look at the prefix matching approach used by https://github.com/hhvm/hack-router ; our benchmarks put it at ~ 5x faster than fast-route's approach, which AIUI is what you've been following here:

$ hhvm naive-benchmark.php
Map has 3718 entries and 6639 URIs
Cold run for simple regexp...
... done (init: 252.22ms, lookups: 18540.45ms, per lookup: 2.79ms, estimated total per request: 255.02ms)
Cold run for uncached fastroute...
... done (init: 2836.94ms, lookups: 493.74ms, per lookup: 0.07ms, estimated total per request: 2837.02ms)
Cold run for cached fastroute...
... done (init: 2463.76ms, lookups: 386.84ms, per lookup: 0.06ms, estimated total per request: 2463.81ms)
Cold run for uncached prefix match...
... done (init: 263.23ms, lookups: 236.94ms, per lookup: 0.04ms, estimated total per request: 263.26ms)
Cold run for cached prefix map...
... done (init: 69.98ms, lookups: 77.89ms, per lookup: 0.01ms, estimated total per request: 69.99ms)
Warm run for simple regexp...
... done (init: 34.87ms, lookups: 17566.62ms, per lookup: 2.65ms, estimated total per request: 37.51ms)
Warm run for uncached fastroute...
... done (init: 2416.26ms, lookups: 389.40ms, per lookup: 0.06ms, estimated total per request: 2416.32ms)
Warm run for cached fastroute...
... done (init: 22.53ms, lookups: 395.27ms, per lookup: 0.06ms, estimated total per request: 22.59ms)
Warm run for uncached prefix match...
... done (init: 30.47ms, lookups: 80.80ms, per lookup: 0.01ms, estimated total per request: 30.49ms)
Warm run for cached prefix map...
... done (init: 4.15ms, lookups: 77.49ms, per lookup: 0.01ms, estimated total per request: 4.17ms)

In short, what we do is:

  • decide on a prefix length
  • build a struct with:
    • a map of prefix => node
    • a map of regex => node where the is no literal string prefix we can use
    • Node is either a responder, or another similar struct
  • when looking up, we fetch the prefix length - and then see if $map['prefixes'][substr($path, 0, $length)] is set - then return the responder or recurse into the sub-map. If not, check for a regex match. If that fails, fail.

This means that each common constant part of the URL can be looked up in constant time - but it will be linear in terms of number of parts, and where regexes are needed, that part will have usual regex complexity.

e.g. if we have /foo/bar => BarController, /foo/baz => BazController, and /herp/derp => DerpController, we'll make:

prefix len: 4
prefixes:
  /foo:
    prefix length: 4
    prefixes:
    /bar: BarController
    /baz: BazController
  /her:
    prefix length: 6
    prefixes:
      p/derp: DerpController

We'll then look up $map[$prefixes[substr($requested_path, 0, 4)]], then the appropriate subprefix.

If we have foo/bar/{id:\d+}, we'll use prefix matching for the 'foo/bar/' part, then a regexp match just for the id:\d+ part

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Feb 16, 2018

@fredemmott happy to see you here :) That's interesting. I looked at the code but didn't run it. From what I understand, I'm not sure this would provide additional performances, because in the end there is still a regex call, and also because the regex we use embeds same-prefix tree optimizations. I also fear this approach is not compatible with our in-order logic. I'd be happy to be wrong though :)
I started writing something on the topic to explain the new matcher in depth:
https://twitter.com/nicolasgrekas/status/964513325815074817
I'm just not at the same-prefix tree optimization explanations yet. Next post.

@fredemmott
Copy link

fredemmott commented Feb 16, 2018

So, there's no regexp call for static routes - only when there's a variable. We're also not using the regexp for any static prefixes before a variable.

The main advantage is that while the regexp is checking for common prefixes, we're looking them up with array key access; for large maps, this can be much faster.

This is definitely not representative of real world performance, but illustrates the difference:

<?php

// generate a bunch of /foo/{id} routes, then look them up

const LENGTH = 8;
const MAP_SIZE = 2500;
const LOOKUPS = 10000;

$random_parts = array_map(
  function() { return bin2hex(random_bytes(intdiv(LENGTH, 2))); },
  range(1, MAP_SIZE)
);

$prefix_map = [];
foreach ($random_parts as $part) {
  $route = '/'.$part.'/';
  $prefix_map[$route] = $part;
}
$regexp = '/^\/('.implode($random_parts, '|').')\/\d+$/';

$requests = array_map(
  function() use ($random_parts) {
    return [
      $random_parts[random_int(0, count($random_parts) - 1)],
      random_int(0, PHP_INT_MAX)
    ];
  },
  range(1, LOOKUPS)
);

// Optimize the requests for each map kind
$regexp_requests = array_map(
  function($request) {
    list($route, $id) = $request;
    return '/'.$route.'/'.$id;
  },
  $requests
);

$prefix_requests = array_map(
  function($request) {
    list($route, $id) = $request;
    return ['/'.$route.'/', (string) $id];
  },
  $requests
);

$regexp_results = [];
$prefix_results = [];

$start = microtime(true);
foreach ($regexp_requests as $request) {
  $matches = [];
  if (preg_match($regexp, $request, $matches) !== 1) {
    throw new Exception('failed to match regexp');
  }
  $regexp_results[] = [$matches[1], $matches[2]];
}
$regexp_time = microtime(true) - $start;

$start = microtime(true);
foreach ($prefix_requests as list($prefix, $id)) {
  $route = $prefix_map[$prefix] ?? null;
  if ($route === null) {
    throw new Exception('failed to match prefix');
  }
  $matches = [];
  if (preg_match('/^\d+$/', $id, $matches) !== 1) {
    throw new Exception('failed to match id regexp');
  }
  $prefix_results[] = [$route, $matches[1]];
}
$prefix_time = microtime(true) - $start;

printf(
  "Regexp time:\t%fs\nPrefix time:\t%fs\nEqual: %s\n",
  $regexp_time,
  $prefix_time,
  $prefix_results == $regexp_results ? 'yes' : 'no'
);
$ php test.php
Regexp time:	0.020579s
Prefix time:	0.001593s
Equal: yes

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Feb 16, 2018

@fredemmott it would be great if you could run the router we have on master and compare it to yours!
The main description in #26059 contains an example to bootstrap a simple bench. I'd be happy to read any numbers you get with it :)

@fredemmott
Copy link

As for order: we have some hybrid approaches that preserve it, but not publicly. For hack-router, we:

  • go for most-specific-first
  • if two routes are 'as specific' as far as we can tell (e.g. we don't prioritize {id:\d+} over '{id}`), we preserve order

This means that if your map is

.+:   WildcardController
/foo: FooController

We wont' preserve order: /foo will go to FooController, rather than WildcardController.

@fredemmott
Copy link

Sorry, not going to be able to do that any time soon; just tried, and the 'urgent' PHP7 compatibility list for HHVM just got a fair bit larger :/

In particular:

@nicolas-grekas
Copy link
Member Author

Have fun, and thanks for the new ideas :)

@nicolas-grekas
Copy link
Member Author

@fredemmott FYI, here is the kind of optims I'm was referring to: #26208.

@allyraza
Copy link

@fredemmott I wonder how your approach holds up against routes that have multiple variables for example /blog/{category}/{id}/{title}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants