Skip to content

[go1.20] Repeat go1.19 strings.Repeat for go1.20 #1324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 17, 2024

Conversation

grantnelson-wf
Copy link
Collaborator

@grantnelson-wf grantnelson-wf commented Jul 17, 2024

Summary

Overriding strings.Repeat with go1.19's implementation because it is significantly faster than the go1.20 implementation and JS's String.replace gives different results.

The CI will still fail for go1.20 but now it will fail faster. This is part of #1270

Full Explanation

Problem with go1.20 repeat

In the go1.20 implementation of strings.Repeat the function was changed to use chunks with a limited size because:

Past a certain chunk size it is counterproductive to use larger chunks as the source of the write, as when the source is too large we are basically just thrashing the CPU D-cache. So if the result length is larger than an empirically-found limit (8KB), we stop growing the source string once the limit is reached and keep reusing the same source string - that should therefore be always resident in the L1 cache - until we have completed the construction of the result. This yields significant speedups (up to +100%) in cases where the result length is large (roughly, over L2 cache size).

Unfortunately, that change makes the GopherJS run some tests significantly slower. One run of encoding/pem TestCVE202224675 took 6224.85s (about 1.7 hours) to finish! That test checks the fix for CVE-2022-24675 which deals with a "stack overflow via a large amount of PEM data". To get that stack overflow, the test starts with

input := []byte(strings.Repeat("-----BEGIN \n", 10000000))

that creates a 120MB string. With the change to go1.20's strings.Repeat limiting the chunk size to 8KB, meaning the inner loop will loop about 15,000 times.

In the go1.19's implementation of strings.Repeat there is no limit so the string is doubled in the loop (exponential growth). This means the inner loop will loop only 24 times. The number of concatenations and loops makes a huge difference in JS, where as the chunk size doesn't. One run of TestCVE202224675 took 24.68s. That's means the go1.19 implementation is 250 times faster than (takes 0.4% the amount of time as) the go1.20 implementation in GopherJS.

Problem with JS repeat

Since we had to roll back the go1.20 implementation of strings.Repeat to go1.19 anyway, I thought I'd see if the JS String.repeat would be faster. I tried

func Repeat(s string, count int) string {
	switch {
	case count == 0:
		return ""
	case count < 0:
		panic("strings: negative Repeat count")
	case len(s)*count/count != len(s):
		panic("strings: Repeat count causes overflow")
	}

	return js.InternalObject(s).Call("repeat", count).String()
}

This does appear to be slightly faster, but not significantly faster, than go1.19. One run of TestCVE202224675 took 19.42s, meaning it is 1.27 times faster than (take 80% the amount of time as) the go1.19 implementation.

Unfortunately, the JS's implementation returns slightly different results. A few tests fail using JS's implementation. For example hash/adler32 TestGolden fails 8 of it's scenarios, specifically those using strings.Repeat. One failed scenario creates the input with

strings.Repeat("\xff", 5548) + "8"

The resulting string from JS has 11,097 characters (not 5,549 like expected). The bytes in the string are [195 191 195 191 ... 195 191 195 191 56]. The repeating phrase 195 191 is because JS converts "\xFF" into the UTF-8 representation of "ÿ" ("\xC3\xBF"). Go treats strings as if they are pre-escaped for UTF-8 and simply repeats the bytes as they are, so the result is 5,549 characters with the byes [255 255 ... 255 255 56].

I didn't check if there was a way to get JS to repeat without doing the additional UTF-8 escaping. I took this as an indication that we should simply roll back the go1.20 implementation to the go1.19 implementation to start with. Further investigation into a JS implementation can be done later.

Copy link
Member

@nevkontakte nevkontakte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@nevkontakte nevkontakte merged commit 1cf45be into gopherjs:go1.20 Jul 17, 2024
1 of 2 checks passed
@grantnelson-wf grantnelson-wf deleted the repeatStringRepeat branch July 17, 2024 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants