Skip to content

New DSN component #36999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

New DSN component #36999

wants to merge 6 commits into from

Conversation

Nyholm
Copy link
Member

@Nyholm Nyholm commented May 28, 2020

Q A
Branch? master
Bug fix? no
New feature? yes
Deprecations? no
Tickets
License MIT
Doc PR See documentation.md

A few years ago I created nyholm/dsn. This PR is building on to the work started in that library.

I've seen discussions in about a new DSN component started by @Simperfit. He points out that 7 components is using some form of DSN. I've also seen @jderusse PR in #24267 where he creates a specific DSN factory class for each component.

What I found interesting is @csarrazi's comment (#32546 (comment)) saying "there is not clear spec for DSNs". I have also not found a definition myself so I thought I would start by defining that a DSN is. I looked on all uses of DSN in symfony/symfony. I later broaden my search to DSN in PHP and continued to other languages. I came up with the following definition based on RFC 3986:

dsn_string:
  { function | dsn }

function:
  function_name[:](dsn_string[,dsn_string])[?query]

function_name:
  REGEX: [a-zA-Z0-9\+-]+

dsn:
  { scheme:[//]authority[path][?query] | scheme:[//][userinfo]path[?query] | host:port[path][?query] }

// ...

(See full definition in the docs)

Please correct me if I'm wrong and there is a DSN RFC somewhere.

Implementation

As you may see from the implementation or from how Symfony is using DSNs, the new component is basically just a bit more advanced parse_url() with a few benefits:

  • Comes with a definition
  • Better exceptions
  • Automatically url-decodes username and passwords
  • Value objects instead of arrays for the url parts
  • Handles all kinds of functions
  • Supports DSNs like: memcached://user:pass@/var/run/memcached.sock

Documentation

I've started a page that could be converted to an PR to symfony/symfony-docs. I'll refer to that page as detailed PR description. See documentation.md.

Usage

There are 2 methods for parsing; DsnParser::parse() and DsnParser::parseFunc().
The latter is for in situations where DSN functions are supported.

$dsn = DsnParser::parse('scheme://127.0.0.1/foo/bar?key=value');
echo get_class($dsn); // "Symfony\Component\Dsn\Configuration\Url"
echo $dsn->getHost(); // "127.0.0.1"
echo $dsn->getPath(); // "/foo/bar"
echo $dsn->getPort(); // null

If functions are supported (like in the Mailer component) we can use DsnParser::parseFunc():

$func = DsnParser::parseFunc('failover(sendgrid://KEY@default smtp://127.0.0.1)');
echo $func->getName(); // "failover"
echo get_class($func->first()); // "Symfony\Component\Dsn\Configuration\Url"
echo $func->first()->getHost(); // "default"
echo $func->first()->getUsername(); // "KEY"

Implementation in existing components

I've started implementation on Mailer and Messenger component. As you can see it will not remove loads of complexity from the component's logic. It just removes the parsing logic, you will still need to check if ie the "port" is set or not.

I've also started to implement this on the Redis Cache adapter. I stopped half way because there was not much logic removed and the diff was pretty much 1:1.

Where I believe this component would be helpful for the tests for any components. There is no longer any need to test the parsing of different DSNs.

Related PRs

This PR will fix #32546
It will also make it simple to fix:

Future work

I've been working on this PR time to time the last week. I would like to get more feedback and comments on the current implementation.
Maybe the value objects could possibly be improved by adding default values to the getters?

  • Get feedback and adapt to comments
  • Add some more tests for edge cases
  • Fix tests on other components

@Nyholm Nyholm requested a review from sroze as a code owner May 28, 2020 19:40
@Nyholm Nyholm force-pushed the dsn branch 2 times, most recently from 20f51d1 to 9d40c78 Compare May 28, 2020 19:43
@Nyholm Nyholm marked this pull request as draft May 28, 2020 19:48
@Nyholm Nyholm force-pushed the dsn branch 2 times, most recently from c9a3830 to d01c068 Compare May 28, 2020 20:55
@fabpot
Copy link
Member

fabpot commented May 29, 2020

Thank you for working on this. I haven't read the code yet, but I think parseSimple should be the default, so should be renamed to parse. And parse would be something like parseFunc.

@Nyholm
Copy link
Member Author

Nyholm commented May 29, 2020

Thank you for the suggestion. I've been considering about doing that after I saw that parsing strings without functions will be way more commonly used.

I've updated the PR and description accordingly.

@javiereguiluz
Copy link
Member

Tobias, thanks for working on this proposal. I haven't looked at the code yet either ... but the parse() + parseSimple() division is very unfortunate:

  • It increases "learning curve" and "cognitive load" of using this. What is "simple"? When to use it? I need to read the docs ... this component is no longer self-explanatory 😢
  • I don't understand why this is necessary. The "simple" parser already takes into account if there's a username or not a password or not a port or not ... so it could take into account if there are functions or not.
  • If we have only one parse() method, we have two possible scenarios:
    • I know if my DSN has functions or not ... this is the best case, I can write the needed code.
    • I don't know if the DSN has functions ... parse() should provide a couple of methods (hasFunctions(), getFunctions()) to deal with this.

@Nyholm
Copy link
Member Author

Nyholm commented May 29, 2020

Yes. I was not a big fan of the word “simple” either. I’ve renamed it after Fabien’s suggestion.

If we are using only one “parse” method then there are two scenarios:
A) the method will return a Dsn or a DsnFunction. This will make the signature ambiguous and you will always need to check what type of object you got back

B) we will always return a DsnFunction and for scenarios where you don’t support functions (which is “most” in symfony/symfony) you need to check if the function name is “dsn” and throw an exception of it isn’t.

A dsn string is per definition (that I concluded and wrote down) recursive. That is why the value object needs to be recursive too. With other words: using a “getFunctions()” will not make any sense. It will always be zero or one items returned and there is not much you can do but calling “getMostOuterFunction()” and then ask for the values for one function at the time.

@Simperfit
Copy link
Contributor

Thanks for working on this Tobias !

@javiereguiluz
Copy link
Member

@Nyholm let me share some code samples. Would this work?

I know the DSN structure:

// no functions
$dsn = DsnParser::parse('scheme://127.0.0.1/foo/bar?key=value');
$host = $dsn->getHost();

// single function
$dsn = DsnParser::parse('foo(scheme://127.0.0.1/foo/bar?key=value)');
$host = $dsn->getFunction()->first()->getHost();

// multiple functions
$dsn = DsnParser::parse('foo(bar(scheme://127.0.0.1/foo/bar?key=value))');
$host = $dsn->getFunction()->first()->getFunction()->first()->getHost();

I don't know the DSN structure:

if ($dsn->hasFunctions()) {
    do {
        $dsn = $dsn->getFunction()->first();
    } while ($dsn->hasFunctions());
}

$host = $dsn->getHost();

@Nyholm
Copy link
Member Author

Nyholm commented May 29, 2020

Thank you for a clear example. That works, but there is not type safety.

In the current state of the PR we got:

DsnParser::parse(string $dsn): Dsn
DsnParser::parseFunc(string $dsn): DsnFunction

To implement your change, we need to:

/**
 * @return Dsn|DsnFunction
 */
DsnParser::parse(string $dsn)

Since a DSN is most likely a user input, we do not know how it looks like. We only know if we expect "zero functions" or "zero or more functions".

@jderusse
Copy link
Member

Dsn and Function shares few attributes (scheme and querystring, and maybe path)
What's about having Function extends Dsn in the same way you have Path extends Dsn?

*
* @throws SyntaxException
*/
public static function parseFunc(string $dsn): DsnFunction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't that return Dns|DnsFunction instead of a fake function named dsn ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would force the user to always check what object they got back.

$dsn = DsnParser::parseFunc($userInput);
if ($dsn instanceof DsnFunction) {
  // Handle function logic
  // Foreach (arguments as argument)
      // handle single argument
} else {
  // Handle single argument
}

But with constant return types:

$dsn = DsnParser::parseFunc($userInput);
  // Foreach (arguments as argument)
      // handle single argument

@javiereguiluz
Copy link
Member

Here's a different proposal. What if we forget about "functions"?

Don't forget that "DSN as a general purpose config" is something that Symfony (probably) has invented. If you Google around, you can't find many DSN used like these. Others only use DSN for databases.

So, instead of this:

dummy://a
failover(dummy://a dummy://b)
roundrobin(dummy://a dummy://b dummy://c)
roundrobin(dummy://a failover(dummy://b dummy://c) dummy://d)

Why not do this:

dummy://a
dummy://a?failover, dummy://b?failover
dummy://a?roundrobin, dummy://b?roundrobin, dummy://c?roundrobin
dummy://a?roundrobin, dummy://b?roundrobin&failover, dummy://c?roundrobin&failover, dummy://d?roundrobin

Parser would return either a Dsn object or a Dsn[] array.

This new format would simplify defining for example different failover mechanisms, etc:

dummy://a?failover=vip, dummy://b?failover=vip, dummy://b?failover=normal

@Nyholm
Copy link
Member Author

Nyholm commented May 29, 2020

Dsn and Function shares few attributes (scheme and querystring, and maybe path)

I've been considering this too. But they dont not share these attributes.
The only thing a Dsn and DsnFunction has in common is parameters (from a query string).

@Nyholm
Copy link
Member Author

Nyholm commented May 29, 2020

Thank you @javiereguiluz. Skipping functions would make things more simple.

Parser would return either a Dsn object or a Dsn[] array.

Returning an object or an array still means that you always need to verify the type that was returned.

The new format is maybe something we can work with. It is not as simple and easy for the users, but it might work. But how do you handle nested functions in an intuitive way?

failover(smtp://a fallback(smtp://b smtp://c) smtp://d)) 

@dunglas
Copy link
Member

dunglas commented May 29, 2020

DSN means Data Source Name, as pointed out by @csarrazi, it's not a standard and there are a lot of DSN formats used in the wild, mostly for historical reasons.

However, as you pointed out, there is a well known and very popular standard for identifiers and structured identifiers. It's RFC 3986 (URI).

A Uniform Resource Identifier (URI) is a compact sequence of
characters that identifies an abstract or physical resource.

URIs are a subset of IRIs (a URI can be dereferenced, a IRI may not be dereferencable, for instance urn:uuid:513a7707-c3ad-48bd-a3d9-ce42e327377c is an IRI but not an URI, because it's a valid IRI identifier, but it doesn't tell how to locate this resource). URLs and URIs are (now) synonyms.

Most modern DSN are in fact URIs.

URIs and IRIs are at the core of the Web and are also used as identifiers for non-Web resources (XML/JSON-LD namespaces, UUIDs, ISBN, EAN).

As Symfony is committed to implement open standards, I think that we should only use valid URIs as identifiers in Symfony (it also allows to parse and build them from most programming language, as there are libraries to manipulate URIs in all ecosystems).
URIs have a lot capabilities, and there are no good reasons for using non-standard things instead.

To me we shouldn't create a DSN component, but a URI (or IRI if we're ambitious) component. PHP has builtin function to manipulate URIs. There also is https://uri.thephpleague.com/ which is pretty good.

To create such component, there two issues main to solve:

  • "functions" aren't valid syntax according to the URI spec. It's a "proprietary", non-standard syntax. Instead of foo(scheme://bar), a valid URI should be like scheme://bar?apply=foo (there are many possible variations, can you share some example of "functions" used in the wild so we can find the corresponding standard URI?). I think we should deprecate the use of this syntax in Symfony and switch to valid URIs instead.
  • The URI and IRIs specs are complex. They also are extended by other RFC (such as URI Template). Implementing the whole specs is hard. But we could provide a minimal (but valid) implementation at first, without supporting all URI features, and extend the coverage in next versions.

@dunglas
Copy link
Member

dunglas commented May 29, 2020

For instance, here you can see how @jderusse used the URI standard to allow passing advanced options in the Mercure's transport DSN: https://mercure.rocks/docs/hub/config#bolt-adapter

@Nyholm
Copy link
Member Author

Nyholm commented May 29, 2020

Thank @dunglas. I've done much of the same research as you just showed.

I tried to add a section (see "Implementation" in PR description) to sum up what this component is really doing.

From a specification point of view, this component builds on the URI specification (RFC 3986) and adds support for functions and memcached://user:pass@/var/run/memcached.sock (paths with user/password).
I added those 2 things because that is how Symfony is currently uses DSNs.

If we decide that Symfony should stop using those 2 things, then I agree that this component should not be merged. We should rather use parse_url() or league/uri-parser (and remember to urldecode() user/password).

@dunglas
Copy link
Member

dunglas commented May 29, 2020

@javiereguiluz the URI spec reserves a lot of delimiters and sub-delimiters for this kind of use cases:

reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="

snip

The purpose of reserved characters is to provide a set of delimiting
characters that are distinguishable from other data within a URI.

https://tools.ietf.org/html/rfc3986#section-2.2

Example:

Aside from dot-segments in hierarchical paths, a path segment is
considered opaque by the generic syntax. URI producing applications
often use the reserved characters allowed in a segment to delimit
scheme-specific or dereference-handler-specific subcomponents. For
example, the semicolon (";") and equals ("=") reserved characters are
often used to delimit parameters and parameter values applicable to
that segment. The comma (",") reserved character is often used for
similar purposes. For example, one URI producer might use a segment
such as "name;v=1.1" to indicate a reference to version 1.1 of
"name", whereas another might use a segment such as "name,1.1" to
indicate the same. Parameter types may be defined by scheme-specific
semantics, but in most cases the syntax of a parameter is specific to
the implementation of the URI's dereferencing algorithm.

https://tools.ietf.org/html/rfc3986#section-3.3

For instance, dummy://a?failover=vip, dummy://b?failover=vip, dummy://b?failover=normal could become dummy://my-authority/a;failover=vip,b;failover=vip,b;failover=normal.

@Nyholm the builtin PHP functions are a bit low level (and don't cover 100% of the spec) and I'm not sure that's a good idea to rely on an external component for something as low level as parsing our own DSNs. So IMHO it makes sense to add a URI component in Symfony. And it could be useful for a lot of other projects (for instance API Platform use URIs and IRIs a lot, and we would benefit from such component too).

@csarrazi
Copy link
Contributor

My main argument against a DSN component is that many DSN formats are not compatible with the format used as an example in RFC 3986. For example, here are DSNs for database access for various languages:

  • PEAR_DB (PHP): mysql://john:pass@localhost:3306/my_db
  • PDO (PHP): mysql:host=localhost;dbname=example
  • DBI (Perl): DBI:Pg:database=finance;host=db.example.com;port=$port
  • ODBC (MS): Driver={ODBC Driver 13 for SQL Server};server=localhost;database=WideWorldImporters;trusted_connection=Yes
  • JDBC (Java): jdbc:odbc:HY_DSN, jdbc:as400://server-name:server-port/, jdbc:teradata://server-name:server-port/database-server-name
  • cx_oracle (Python): dbhost.example.com/orclpdb1
  • LDAP (RFC 2255): ldap://host:port/basedn?attr?scope?filter
  • MongoDB (language-independent): mongodb://[username:password@]host1[:port1][,...hostN[:portN]][/[defaultauthdb][?options]]

As you can see, this is both vendor and library-dependent. I do think that it would make sense to have a URI component, which could be reused for some DSN implementations, but not a DSN component per se.

It is also important to note that some database providers try not to expose any credentials-related information in their DSNs, which are often only named connections, which are fetched from various places (tnsnames.ora, ODBC manager, etc.).

@Nyholm
Copy link
Member Author

Nyholm commented May 29, 2020

For example, here are DSNs for database access for various languages:

I'll comment these one by one.

PEAR_DB (PHP): mysql://john:pass@localhost:3306/my_db

Supported by the DSN component.

PDO (PHP): mysql:host=localhost;dbname=example

Not supported, but mysql:?host=localhost&dbname=example is supported.
We could add support for this if we feel it is important.

DBI (Perl): DBI:Pg:database=finance;host=db.example.com;port=$port

Not supported

ODBC (MS): Driver={ODBC Driver 13 for SQL Server};server=localhost;database=WideWorldImporters;trusted_connection=Yes

Not supported

JDBC (Java): jdbc:odbc:HY_DSN, jdbc:as400://server-name:server-port/, jdbc:teradata://server-name:server-port/database-server-name

Not supported

cx_oracle (Python): dbhost.example.com/orclpdb1

Supported by the DSN component.

LDAP (RFC 2255): ldap://host:port/basedn?attr?scope?filter

Supported by the DSN component.

MongoDB (language-independent): mongodb://[username:password@]host1[:port1][,...hostN[:portN]][/[defaultauthdb][?options]]

Supported by the DSN component, but could be added if we feel it is important

@nicolas-grekas
Copy link
Member

I'm not sure I agree about an Uri component. The domain looks pretty different to me and this would restrict the usefulness of the component a lot.

Actually, I would like to suggest adding connection factories to the Dsn component. We're already using the Cache component as a dsn-to-connection factory: to me that's a big part of the reusability need. We could also benefit from a connection registry, indexed by dsn; and from providing a lazyness mechanism that would prevent requiring proxy-manager for dealing with this concern.

My 2cts :)

@kralos
Copy link

kralos commented Jun 16, 2020

I have an edge case for tests; MongoDB connection URIs: https://docs.mongodb.com/manual/reference/connection-string/

Specifically, they can list multiple hosts and duplicate querystring parameters. mongodb://fab:pot@db1.symfony.com:27017,db2.symfony.com:27017,db3.symfony.com:27017/myDb?replicaSet=prod&readPreferenceTags=eu&readPreferenceTags=us

As for actually using this component in the context of mongodb; since mongodb drivers accept a uri already, there is no reason to parse one. The string is simply passed directly to the driver.

@fabpot
Copy link
Member

fabpot commented Sep 4, 2020

I think we need to take a decision here. Do we want a generic DSN component or are we happy with the current situation?

@javiereguiluz
Copy link
Member

My 2 cents: we're probably not happy with the current situation (too many different parts of Symfony using DSN) ... but we aren't 100% happy with a DSN component either because it looks like there's a lot of different possible DSN formats, so abstracting is hard.

@sroze
Copy link
Contributor

sroze commented Sep 30, 2020

I do agree with @csarrazi & @dunglas that it makes more sense (to me) to have a URI component. This is a known format with its RFC (3986) and is very likely what we should push forward for our own DSNs. If we go towards a generic DSN component we will have no reason for refusing the likes of ODBC MS and that can quickly become very bloated for very particular edge cases.

@fabpot
Copy link
Member

fabpot commented Sep 30, 2020

I forgot about the URI idea. I think having a URI component would solve this issue nicely (at least as a first step).

@devnix
Copy link

devnix commented Sep 30, 2020

What about a DSN contract, and a DSN component with several implementations that are reused across several components? For example: PdoDsn, RedisDsn, OdbcDsn, UriDsn, MailerDsn, SymfonyDsn.

If some package needs a custom Dsn then it can live in his own package, implementing the contract from the package, and not requiring the implementations package. Maybe this could be a good way of implementing for example ODBC edge cases in his very own class.

If I need something not standard at all I still can implement the contract with my very own class. I don't know if almost every DSN can be parsed using regular expressions, or if a lexer could be the option to help each DSN implement his custom parser.

Also, there could be a factory that takes an array of DSN implementations in his constructor. Then you could pass it a DSN, and try to guess which specific implementation is by iterating all the registered classes and returning the one which matched first.

I'm unsure if this last idea could be possible, but this is a rough example:

$dsnFactory = new Symfony\Component\Dsn\Factory([
    Symfony\Component\Dsn\PdoDsn::class,
    Symfony\Component\Dsn\RedisDsn::class,
    Symfony\Component\Dsn\OdbcDsn::class,
    Symfony\Component\Dsn\UriDsn::class,
    Symfony\Component\Dsn\MailerDsn::class,
    Symfony\Component\Dsn\SymfonyDsn::class,
    App\Dsn\MyCustomDsn::class,
]);

$dsnFactory->create('odbc:Driver={Microsoft Access Driver (*.mdb)};Dbq=C:\\db.mdb;Uid=Admin'); // Symfony\Component\Dsn\OdbcDsn instance

If this is possible, the DSN contract may have a static test(string $dsn): bool method and the factory could use it for each DSN class registered in the constructor.

@Nyholm
Copy link
Member Author

Nyholm commented Oct 3, 2020

Thank you for all the reviews and feedback.

I see that we are not convinced that this PR should be merged. I will release this PR as version 2.0 of nyholm/dsn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DSN] Add a new DSN component