Skip to content

allow utf-8 header for svg detection #2481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 15, 2021
Merged

allow utf-8 header for svg detection #2481

merged 5 commits into from
Oct 15, 2021

Conversation

jcupitt
Copy link
Member

@jcupitt jcupitt commented Oct 14, 2021

We were checking that the first 24 chars of an SVG were plain ASCII,
but that's not always the case, for example:

<svg id="レイヤー_1のコピー"
	data-name="レイヤー 1のコピー"
	xmlns="http://www.w3.org/2000/svg"
	viewBox="0 0 100 100">
</svg>

We now test for the string "<svg" being in the first 1000 bytes, and
everything up to that being valid utf-8.

See #2438

We were checking that the first 24 chars of an SVG were plain ASCII,
but that's not always the case, for example:

	<svg id="レイヤー_1のコピー"
		data-name="レイヤー 1のコピー"
		xmlns="http://www.w3.org/2000/svg"
		viewBox="0 0 100 100">
	</svg>

We now test for the string "<svg" being in the first 1000 bytes, and
everything up to that being valid utf-8.

See #2438
we could have read a byte or two after the end of the string if
haystack ended in part of needle and a utf-8 character straddled the
boundary
@jcupitt
Copy link
Member Author

jcupitt commented Oct 14, 2021

It's failing on this webp:

https://github.com/libvips/libvips/blob/master/test/test-suite/images/looks-like-svg.webp

A webp files that happens to contain the characters <svg, and also be valid utf-8 to that point.

Perhaps we should check all the first 1000 bytes are valid utf-8? @lovell, any thoughts?

@kleisauke
Copy link
Member

Alternatively, we could increase the priority of webpload, as that only needs to sniff 12 bytes from the header.

/* is_a() is not that quick ... lower the priority.
*/
foreign_class->priority = -50;

if( (p = vips_source_sniff( source, 12 )) &&

@lovell
Copy link
Member

lovell commented Oct 15, 2021

Are we sure that WebP image is valid UTF-8 up until the <svg bit?

$ head -c 588 looks-like-svg.webp | iconv -f UTF-8
RIFF(
WEBPVP8X
iconv: illegal input sequence at position 24

@jcupitt
Copy link
Member Author

jcupitt commented Oct 15, 2021

Ooof, you were right Lovell, it was my fault, you can't test for g_utf8_get_char_validated() returning -1 or -2 (the two error returns) using < 0, since gunichar is really a uint32. Unsigned int / signed int comparisons in C are the devil.

I've raised the priority of webpload as well -- as you say Kleis, it's very quick (much faster than svgload), so it should be near the top. It was on a very low priority before, perhaps we used to use a slow sniffer for it?

Anyway, it seems to work now!

it was very low priority before, for some reason
Copy link
Member

@lovell lovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to merge and see if OSS-Fuzz can find anything. Pleased to see looks-like-svg.webp is the test case that keeps on giving.

@jcupitt jcupitt merged commit b2527da into master Oct 15, 2021
@jcupitt jcupitt deleted the allow-utf8-svg-header branch October 15, 2021 12:21
@kleisauke
Copy link
Member

Note that OSS-Fuzz does not currently build against librsvg, so this change is not covered. I've looked into this in the past, but it requires adding these (sub-)dependencies as well:

Dependency-graph librsvg
  • gdk-pixbuf
  • cairo
    • pixman
    • fontconfig (recommend)
  • pango
    • cairo (recommend)
    • fontconfig
    • freetype
    • harfbuzz
    • fribidi
  • libxml2

@lovell
Copy link
Member

lovell commented Oct 15, 2021

@kleisauke Good spot, I can take a look at adding it. To start with we might be able to get away using apt to install some/all of these rather than build from source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants