Skip to content

symfony/routing compiled route fails matching with pre-defined subroutines requirements in (?(DEFINE)(...)) format #47456

Closed
@Levivb

Description

@Levivb

Symfony version(s) affected

6.1.3

Description

The Route class provides the ability to pass a $requirements parameter which dictates which format an uri parameter should conform too. It's possible to define a regex.

For example for route /test/{uuid}, the requirement '^[a-f\d]{8}(?:-[a-f\d]{4}){3}-[a-f\d]{12}$' can be provided. Which ensures {uuid} conforms to the given regular expression which validates for a uuid.

The regular expression syntax facilitates a way to define Pre-Defined Subroutines before the actual match is executed in a (?(DEFINE) block (which should always precede anything other in the regex). See https://www.rexegg.com/regex-disambiguation.html#define

The RouteCompiler doesn't take this aspect into account as it will compile an invalid regular expression which will never match. Causing a silent failure of an otherwise valid request. Pre-Defined Subroutines in regex are not really well known, which could explain why this hasn't popped up before.

How to reproduce

<?php

declare(strict_types=1);

// First, run "composer require symfony/routing"
// Then, execute this file:

require_once __DIR__.'/vendor/autoload.php';

use Symfony\Component\Routing\Route;

$uuid = 'b7284e8b-0756-419a-a003-8bbfc39c4f78';
$matchingUri = '/test/'.$uuid;

// Here we'll construct a 'simple' regex to match the `{uuid}` part in the route against an uuid
$simpleUuidPattern = '^[a-f\d]{8}(?:-[a-f\d]{4}){3}-[a-f\d]{12}$';
$simpleRoute = new Route('test/{uuid}', [], ['uuid' => $simpleUuidPattern], ['utf8' => true], '', [], ['GET', 'HEAD']);
$simpleRouteRegex = $simpleRoute->compile()->getRegex();

echo "Given route regex: " . $simpleUuidPattern . PHP_EOL;
echo "Compiled route regex: " . $simpleRouteRegex . PHP_EOL;
echo "Matching against {$matchingUri}" . PHP_EOL;
echo "Matching a 'simple' regex. Match result: " . var_export(preg_match($simpleRouteRegex, $matchingUri), true) . PHP_EOL;
echo "Result matches as expected" . PHP_EOL;

echo PHP_EOL . PHP_EOL;

// Here we'll construct a 'complex' regex to match the `{uuid}` part in the route against an uuid
$complexUuidPattern = '(?(DEFINE)(?<uuidpart>[a-f\d]{4}))^(?&uuidpart){2}(?:-(?&uuidpart)){3}-(?&uuidpart){3}$';
$complexRoute = new Route('test/{uuid}', [], ['uuid' => $complexUuidPattern], ['utf8' => true], '', [], ['GET', 'HEAD']);
$complexRouteRegex = $complexRoute->compile()->getRegex();

echo "Given route regex: " . $complexUuidPattern . PHP_EOL;
echo "Compiled route regex: " . $complexRouteRegex . PHP_EOL;
echo "Matching against {$matchingUri}" . PHP_EOL;
echo "Matching a 'complex' regex. Match result: " . var_export(preg_match($complexRouteRegex, $matchingUri), true) . PHP_EOL;
echo "Result does not match expectation" . PHP_EOL;

echo PHP_EOL . PHP_EOL;

// Here we'll plainly match the `$complexUuidPattern` against the defined `$uuid` to proof that it does really match
echo "Given regex: " . $complexUuidPattern . PHP_EOL;
echo "Matching against {$uuid}" . PHP_EOL;
echo "Matching a 'complex' regex against the plain uuid. Match result: " . var_export(preg_match('~' . $complexUuidPattern . '~', $uuid), true) . PHP_EOL;
echo "Result matches as expected" . PHP_EOL;

Result:

Given route regex: ^[a-f\d]{8}(?:-[a-f\d]{4}){3}-[a-f\d]{12}$
Compiled route regex: {^/test/(?P<uuid>[a-f\d]{8}(?:-[a-f\d]{4}){3}-[a-f\d]{12})$}sDu
Matching against /test/b7284e8b-0756-419a-a003-8bbfc39c4f78
Matching a 'simple' regex. Match result: 1
Result matches as expected


Given route regex: (?(DEFINE)(?<uuidpart>[a-f\d]{4}))^(?&uuidpart){2}(?:-(?&uuidpart)){3}-(?&uuidpart){3}$
Compiled route regex: {^/test/(?P<uuid>(?(DEFINE)(?<uuidpart>[a-f\d]{4}))^(?&uuidpart){2}(?:-(?&uuidpart)){3}-(?&uuidpart){3})$}sDu
Matching against /test/b7284e8b-0756-419a-a003-8bbfc39c4f78
Matching a 'complex' regex. Match result: 0
Result does not match expectation


Given regex: (?(DEFINE)(?<uuidpart>[a-f\d]{4}))^(?&uuidpart){2}(?:-(?&uuidpart)){3}-(?&uuidpart){3}$
Matching against b7284e8b-0756-419a-a003-8bbfc39c4f78
Matching a 'complex' regex against the plain uuid. Match result: 1
Result matches as expected

Possible Solution

  1. I understand the complexity of supporting this structure at all. So not supporting it is an option. It would be nice if the lack of support would be made explicit by throwing an exception when the (?DEFINE)(...)(...) structure is detected in one of the requirements. (this would showcase failure on (automatic) refactorings in userland code)

  2. Another solution would be to run a pre-parser over each requirement, unwinding the Pre-Defined Subroutines into actual plain matching parts. Transpiling (?(DEFINE)()) to it's fully written out form, so to speak. Although that could become complex pretty fast. This would however provide a transparent way of supporting pre-defined subroutines in requirements.

    1. Starting regex: (?(DEFINE)(?<uuidpart>[a-f\d]{4}))^(?&uuidpart){2}(?:-(?&uuidpart)){3}-(?&uuidpart){3}$
    2. Break up into parts:
      • '(?(DEFINE)(?<uuidpart>[a-f\d]{4}))
      • ^(?&uuidpart){2}(?:-(?&uuidpart)){3}-(?&uuidpart){3}$
    3. Break up pre-defined subroutines - this would be the hard part if done with a regex itself due to complex nesting of ( ) and escaped \( and \). A parser would probably be more suitable here. Result: ['uuidpart' => '[a-f\d]{4}']
    4. Replace group references in the regex with a non-capture group containing the unfolded pre-defined subroutine: ^(?:[a-f\d]{4}){2}(?:-(?:[a-f\d]{4})){3}-(?:[a-f\d]{4}){3}$
  3. A third solution would be move all the pre-defined subroutines to the beginning of the compiled regular expression. This would require merging the pre-defined subroutines of all requirements into one (?(DEFINE) block. Special care needs to be put into preventing naming conflicts. Imagine a route path /test/{token}/{id} where {token} defines (?(DEFINE)(?<rule>\w+)) and {id} defines (?(DEFINE)(?<rule>\d+)), then the compiled regex should become something like (?(DEFINE)(?<ruletoken>\w+)(?<ruleid>\d+)), also replacing (?&rule) with (?&ruletoken) in the token regex and (?&rule) with (?&ruleid) in the id regex of course.

Additional Context

Related material: #26524

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions