Skip to content

[DomCrawler] Do not rely on mbstring.substitute_character #60305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lyrixx opened this issue Apr 30, 2025 · 1 comment
Closed

[DomCrawler] Do not rely on mbstring.substitute_character #60305

lyrixx opened this issue Apr 30, 2025 · 1 comment

Comments

@lyrixx
Copy link
Member

lyrixx commented Apr 30, 2025

Symfony version(s) affected

allk

Description

Woooo, I faced a very hard to find bug!
I use bopoda/robots-txt-parser, and they hardcoded something very strange in a class:

// Strip invalid characters from UTF-8 strings
ini_set('mbstring.substitute_character', "none");

I already open an issue there to fix the issue.

But I think we can protect Symfony for such issue.

How to reproduce

I created a small reproducer.
(Un)Comment the following line, and run the script

new RobotsTxtParser('');

You'll see the output is different.

Possible Solution

Force the following line in our code

ini_set('mbstring.substitute_character', "");

Additional Context

I'm not sure we have to do this. But let's discuss it!

@tim-lappe
Copy link

I was able to reproduce it with your code, but imo this issue is only related to the robots-txt-parser.
I don't think we should fix this in symfony that way.

@lyrixx lyrixx closed this as not planned Won't fix, can't repro, duplicate, stale May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants