-
Notifications
You must be signed in to change notification settings - Fork 788
latest "prevent large objects from being read into memory" commit caused "object not found" error #323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
detailed analysis please refer to my comments in #305 |
the codes in trouble are:
here:
so it looks like the "h, err := p.objectHeaderAtOffset(1836404)" in fsobject.go returns an invalid header which has a default Reference Hash zero. what interesting is, "h, err := s.nextObjectHeader()" in "SeekObjectHeader" function from scanner.go return 000000000000000000000 for all the files. so I print all the objectHeader and find the function in trouble is func (s *Scanner) nextObjectHeader() (*ObjectHeader, error) we did not set h.Reference for objectHeader except for REFDeltaObject. for all other objectHeader types, we leave the h.Reference with nil. Of course it's not the fault of this commit, but this commit definitely rely on the plumbing.Hash check for objectHeader. |
FWIW we also have a test reproducing what appears to be the same problem (or at least surfaced by the same error) which was passing on |
Hi peeps. Sorry this seems to have broken things. The problem lies in go-git/plumbing/format/packfile/packfile.go Lines 418 to 438 in db4233e
For a start it looks like I made a mistake in L424-L435. That should be: func (p *Packfile) readOFSDeltaObjectContent(h *ObjectHeader, deltaRC io.ReadCloser) (io.ReadCloser, error) {
hash, err := p.FindHash(h.OffsetReference)
if err != nil {
return nil, err
}
base, err := p.objectAtOffset(h.OffsetReference, hash)
if err != nil {
return nil, err
}
return ReaderFromDelta(h, base, deltaRC)
} but then it looks like the ReaderFromDelta function isn't applying correctly. I'll keep on looking. Apologies - I didn't manage to get a test case to hit this. |
Here is a simple test case pulled out of our code base: package main
import (
"testing"
"github.com/go-git/go-git/v5"
)
func TestRepro(t *testing.T) {
_, err := git.PlainClone(t.TempDir(), false, &git.CloneOptions{
URL: "https://github.com/hashicorp/terraform",
ReferenceName: "main",
Depth: 1,
Tags: git.NoTags,
})
if err != nil {
t.Fatal(err)
}
} $ go test ./...
I guess you may be looking for one that doesn't involve cloning a big repository, i.e. one that can be just added to this codebase, but I thought I'd mention it anyway. |
OK so I think the issue is the scanner is seeking away from the delta - I'll put up a PR that just reads the whole OFS delta and REF delta into memory, and fixes the other bug I've noted above. There will be potentially be some memory issue here whereby the delta could be too large so this will need further review. |
Unfortunately the code in go-git#303 was incorrect for large objects in OFS deltas. The underlying fault is that the seeker in the packfile was seeking away whilst the delta was being read. This PR simply reads the delta into a buffer. Fix go-git#323 Signed-off-by: Andrew Thornton <art27@cantab.net>
Sorry about this once again. |
I think we can close this now #329 is merged |
I can confirm that the same test that was previously failing in our codebase with |
This PR adds code to prevent large objects from being read into memory from packfiles or the filesystem. Objects greater than 1Mb are now no longer directly stored in the cache or read completely into memory. This PR differs and improves the previous broken go-git#323 by fixing several bugs in the reader and transparently wrapping ReaderAt as a Reader. Signed-off-by: Andrew Thornton <art27@cantab.net>
yes, it works nicely for me now as well - thanks! |
… memory completely (#330) This PR adds code to prevent large objects from being read into memory from packfiles or the filesystem. Objects greater than 1Mb are now no longer directly stored in the cache or read completely into memory. This PR differs and improves the previous broken #323 by fixing several bugs in the reader and transparently wrapping ReaderAt as a Reader. Signed-off-by: Andrew Thornton <art27@cantab.net>
I think this can be closed now? |
… memory completely (go-git#330) This PR adds code to prevent large objects from being read into memory from packfiles or the filesystem. Objects greater than 1Mb are now no longer directly stored in the cache or read completely into memory. This PR differs and improves the previous broken go-git#323 by fixing several bugs in the reader and transparently wrapping ReaderAt as a Reader. Signed-off-by: Andrew Thornton <art27@cantab.net>
@pjbgf this looks like it can be closed now. |
Thanks everybody that confirmed this is now working. @AriehSchneier thanks for the heads up. |
… memory completely (go-git#330) This PR adds code to prevent large objects from being read into memory from packfiles or the filesystem. Objects greater than 1Mb are now no longer directly stored in the cache or read completely into memory. This PR differs and improves the previous broken go-git#323 by fixing several bugs in the reader and transparently wrapping ReaderAt as a Reader. Signed-off-by: Andrew Thornton <art27@cantab.net>
@zeripath Before applying commit "720c192", these codes works:
After the commit, these codes return a "object not found" error.
Because now the 3mb file "jyut6ping3.dict.yaml" in commit tree object can't not have a blob Reader().
That is, the b.obj.Reader() function in plumbing/object/object.go can't return a valid io.ReadCloser and nil error now.
The text was updated successfully, but these errors were encountered: