-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[Serializer] Add a data: URI normalizer #16164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
*/ | ||
public function denormalize($data, $class, $format = null, array $context = array()) | ||
{ | ||
if (!preg_match('/^data:(.*),(.*)/', $data)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a better regex (as suggested by @meyerbaptiste) such as https://gist.githubusercontent.com/bgrins/6194623/raw/2ace3be86fa54b72b2c10e013cbc37d79b222bd7/detectDataURL.js
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex in the PR is very slow (https://regex101.com/r/rC9nW6/2). The first improvement would be to make the first match lazy: '/^data:(.*?),(.*)/'
(https://regex101.com/r/rC9nW6/1)
The one from the gist linked by @dunglas is indeed not only the most exact, but also the fastest one: https://regex101.com/r/dL9cL4/1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making the last part of the regex possesive should also help '/^data:(.*?),(.*+)/'
.
ping @symfony/deciders |
|
||
public function __construct(MimeTypeGuesserInterface $mimeTypeGuesser = null) | ||
{ | ||
if (null === $mimeTypeGuesser && class_exists('Symfony\Component\HttpFoundation\File\MimeType\MimeTypeGuesser')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class_exists()
call is useless (if the class is missing, the MimeTypeGuesserInterface
is missing too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not useless in the following case: new DataUriNormalizer()
(without argument).
The constructor argument is optional, so the existence of the interface will not be checked and you cannot know if MimeTypeGuesser
exists or no.
ping @symfony/deciders |
use Symfony\Component\Serializer\Exception\UnexpectedValueException; | ||
|
||
/** | ||
* Normalizes a {@see \SplFileInfo} object to a data URI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a SplFileInfo [...] no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an SplFileInfo afaik
Should be ok now. |
if ($object instanceof File) { | ||
$mimeType = $object->getMimeType(); | ||
} elseif ($this->mimeTypeGuesser) { | ||
$mimeType = $this->mimeTypeGuesser->guess($object->getPathname()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need a type-check on this to make sure it's an SplFileInfo
. I know that you pointed to supportsNormalization
as checking for this, but that's not good enough. Not being a normalizer expert, I just looked at this method at it confused me. It's not clear without checking for the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pro as you pointed out is clarity.
The con is the (little) performance impact.
As dealing with data:
URI is slow by definition, we can add it here.
I added some comments, but this generally looks quite good and interesting |
@weaverryan just introduced private methods as you suggested. I also added a check for the type as suggested by you and @xabbuh |
} | ||
|
||
$mimeType = $this->getMimeType($object); | ||
list($typeName) = explode('/', $mimeType, 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minutiae detail, but I'd like this line moved right before the if
statement that actually uses $typeName
- it's not needed up here.
|
||
$mimeType = $this->getMimeType($object); | ||
list($typeName) = explode('/', $mimeType, 2); | ||
$object = $this->extractSplFileObject($object); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call this $splFileObject
for clarity and so we're not overriding the previous variable (even though we don't need it)
I left some very very minor comments, but 👍 Status: Reviewed |
@weaverryan Thanks for the review! Last comments fixed. |
return new \SplFileObject($data); | ||
} | ||
|
||
throw new InvalidArgumentException(sprintf('The class parameter "%s" is not supported. It must be one of "SplFileInfo", "SplFileObject" or "Symfony\Component\HttpFoundation\File\File".', $class)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO cleaner would be to throw this exception after try/catch block.
Thank you @dunglas. |
This PR was squashed before being merged into the 3.1-dev branch (closes #16164). Discussion ---------- [Serializer] Add a data: URI normalizer | Q | A | ------------- | --- | Bug fix? | no | New feature? | yes | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | n/a | License | MIT | Doc PR | not yet A new normalizer to transform `\SplFileInfo`, `\SplFileObject` and `\Symfony\Component\HttpFoundation\File\File` to [`data:` URI](https://en.wikipedia.org/wiki/Data_URI_scheme) and the opposite. It's convenient when dealing with the JavaScript `FileReader` API. Usage: ```php $normalizer = new DataUriNormalizer(); echo $normalizer->normalize(new File(__DIR__.'/test.gif')); // data:image/gif;base64,R0lGODdhAQABAIAAAP///////ywAAAAAAQABAAACAkQBADs= echo $normalizer->normalize(new File(__DIR__.'/test.txt')); // data:text/plain,K%C3%A9vin%20Dunglas%0A var_dump($normalizer->denormalize('data:image/gif;base64,R0lGODdhAQABAIAAAP///////ywAAAAAAQABAAACAkQBADs=', 'SplFileObject'); // a SplFileObject (also supports HttpFoundation file if the composant is installed) ``` cc @clementtalleu Commits ------- cc7b5af [Serializer] Add a data: URI normalizer
There is a practical problem with this: If the data URI is long (think few MiBs), it quickly takes up a lot of memory (easily tens or hundreds of MiBs) as it's passed by value through the Serializer... In my case I've had to write to a temp file very early to avoid the problem. Not sure if there's a more elegant solution... |
During the denormalization? It's "intended" as the current design is to handle everything in memory. Having support for on disk tmp file (or both, using the Anyway, I would recommend using a specific endpoint accepting Are you interested on opening a PR for that @teohhanhui ? |
@dunglas I don't think I have any solution to offer on this...
Did you mean serializing (unprocessed) data to disk during denormalization (to avoid duplicating the large string when repeatedly passed by value)? That seems really convoluted... Otherwise, if you mean only during the denormalization step of the data URI, I don't see how that can avoid the problem I mentioned above... |
Anyway, one is more likely to run into a problem with |
*/ | ||
public function denormalize($data, $class, $format = null, array $context = array()) | ||
{ | ||
if (!preg_match('/^data:([a-z0-9]+\/[a-z0-9]+(;[a-z0-9\-]+\=[a-z0-9\-]+)?)?(;base64)?,[a-z0-9\!\$\&\\\'\,\(\)\*\+\,\;\=\-\.\_\~\:\@\/\?\%\s]*\s*$/i', $data)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^data:([a-z0-9]+\/[a-z0-9]+(;[a-z0-9\-]+\=[a-z0-9\-]+)?)?(;base64)?,
[a-z0-9\!\$\&\\\'\,\(\)\*\+\,\;\=\-\.\_\~\:\@\/\?\%\s]*\s*$
...could be made posessive/greedy for better performance...
^data:([a-z0-9]++\/[a-z0-9]++(;[a-z0-9\-]++\=[a-z0-9\-]++)?)?(;base64)?,
[a-z0-9\!\$\&\\\'\,\(\)\*\+\,\;\=\-\.\_\~\:\@\/\?\%\s]*+\s*+$
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, is matching the string all the way to the end really necessary (with a large list of allowed characters)?
I mean we're not validating if the actual data corresponds with the encoding specification, the user could still (for example) use the base64 option but not send base64 encoded data.
We're taking the correctness of the data (more or less) for granted -- so why allow the regex engine to even bother scanning it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is two main reasons:
- this is very security sensitive (in my former implementation, it was allowing to access to /etc/shadow for instance and, being not a security expert, I prefer to be conservative - what about specials chars...)
- the PHP stream wrapper can generate an error if the data is invalid, the regex prevent it
However 👍 to make the regex as fast as possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh-oh, i can't imagine how the /etc/shadow file could get exposed since the
prefix is predetermined, can you please give me some documentation?
On 20:14, Thu, Feb 4, 2016 Kévin Dunglas notifications@github.com wrote:
In src/Symfony/Component/Serializer/Normalizer/DataUriNormalizer.php
#16164 (comment):
return $data instanceof \SplFileInfo;
- }
- /**
\* {@inheritdoc}
*
\* Regex adapted from Brian Grinstead code.
*
\* @see https://gist.github.com/bgrins/6194623
*
\* @throws InvalidArgumentException
\* @throws UnexpectedValueException
*/
- public function denormalize($data, $class, $format = null, array $context = array())
- {
if (!preg_match('/^data:([a-z0-9]+\/[a-z0-9]+(;[a-z0-9-]+\=[a-z0-9-]+)?)?(;base64)?,[a-z0-9!\$&\\'\,()*+\,\;\=-._~:\@\/\?\%\s]_\s_$/i', $data)) {
There is two main reasons:
- this is very security sensitive (in my former implementation, it was
allowing to access to /etc/shadow for instance and, being not a security
expert, I prefer to be conservative - what about specials chars...)- the PHP stream wrapper can generate an error if the data is invalid,
the regex prevent itHowever [image: 👍] to make the regex as fast as possible
—
Reply to this email directly or view it on GitHub
https://github.com/symfony/symfony/pull/16164/files#r51913006.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dunglas the original RegEx does not match application/octet-stream
because the second part of the mime type is not allowed to contain a dash.
Which means the DataUriNormalizer can't match a data uri
that it produced (when no mime type info was available).
Also there's an issue in the normalizer
logic, the Symfony mime type guesser tries to fetch the path of the file and assert that the file exists. This does not work with php://memory
/ php://temp
or data:
type files.
I could submit a PR with the aforementioned speedups as well as some logic to make the mime type guessing part optional (ex.: supply a mime type via $context
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm checking out this list and apparently literal dots and underscores are also (sometimes) used in mime-type names (especially in the second part) and dashes are (sometimes) used in the first part.
So the RegEx needs to allow for those too.
This PR was squashed before being merged into the master branch (closes #6314). Discussion ---------- New normalizers | Q | A | ------------- | --- | Doc fix? | no | New docs? | yes symfony/symfony#17603 symfony/symfony#17411 symfony/symfony#16164 | Applies to | 3.1 | Fixed tickets | Commits ------- 4ff3a28 New normalizers
A new normalizer to transform
\SplFileInfo
,\SplFileObject
and\Symfony\Component\HttpFoundation\File\File
todata:
URI and the opposite.It's convenient when using the
FileReader.readAsDataURL()
in JavaScript.Usage:
cc @clementtalleu