Skip to content

memory leak with clone #315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
axetroy opened this issue May 19, 2021 · 10 comments
Closed

memory leak with clone #315

axetroy opened this issue May 19, 2021 · 10 comments
Labels
stale Issues/PRs that are marked for closure due to inactivity

Comments

@axetroy
Copy link

axetroy commented May 19, 2021

When I cloned a repo about 1G in size, it consumed my 1G of memory and generated a lot of stacks

// 拉取代码
func (p *Puller) Pull() (string, error) {
	p.logger.Printf("Pulling %s\n", p.repo)

	options := git.CloneOptions{
		URL:               p.repo,
		Progress:          p.writer,
		SingleBranch:      true,
		ReferenceName:     plumbing.ReferenceName("refs/heads/dev"),
		Depth:             1,
		RecurseSubmodules: git.DefaultSubmoduleRecursionDepth,
	}

	if p.auth != nil {
		options.Auth = *p.auth
	}

	tempDir := os.TempDir()

	rootPath := filepath.Join(tempDir, p.taskID)

	p.logger.Printf("拉取项目到目录 %s\n", rootPath)

	// clone project
	fs := osfs.New(rootPath)

	gitDir := filepath.Join(rootPath, ".git")

	storage := filesystem.NewStorage(osfs.New(gitDir), cache.NewObjectLRU(cache.MiByte*50))

	_, err := git.CloneContext(p.ctx, storage, fs, &options)

	if err != nil {
		return "", errors.WithStack(err)
	}

	return rootPath, nil
}
@kbiao
Copy link

kbiao commented Aug 12, 2021

Any progress on this issue ....

I also found that clone depths set to 1 also pull down the whole thing, unlike

git clone --depth 1 xxx@git

it has a different effect


这个问题有啥进展么....我还发现 clone depth 设置为 1 的也会整个拉下来, 和 git clone --depth
1 的效果不一样

@axetroy axetroy changed the title performance with clone in disk memory leak with clone Oct 15, 2021
@axetroy
Copy link
Author

axetroy commented Oct 15, 2021

Here is for info

Showing nodes accounting for 407.33MB, 99.07% of 411.16MB total
Dropped 24 nodes (cum <= 2.06MB)
Showing top 10 nodes out of 14
      flat  flat%   sum%        cum   cum%
  325.48MB 79.16% 79.16%   325.48MB 79.16%  bytes.makeSlice
   81.84MB 19.91% 99.07%   407.33MB 99.07%  github.com/go-git/go-git/v5/plumbing/format/packfile.(*Parser).get
         0     0% 99.07%       64MB 15.57%  bytes.(*Buffer).ReadFrom
         0     0% 99.07%   261.48MB 63.60%  bytes.(*Buffer).Write
         0     0% 99.07%   325.48MB 79.16%  bytes.(*Buffer).grow
         0     0% 99.07%   407.96MB 99.22%  github.com/go-git/go-git/v5/plumbing/format/packfile.(*Parser).Parse
         0     0% 99.07%       64MB 15.57%  github.com/go-git/go-git/v5/plumbing/format/packfile.(*Parser).readData
         0     0% 99.07%   407.33MB 99.07%  github.com/go-git/go-git/v5/plumbing/format/packfile.(*Parser).resolveDeltas
         0     0% 99.07%       64MB 15.57%  github.com/go-git/go-git/v5/plumbing/format/packfile.(*Scanner).NextObject
         0     0% 99.07%       64MB 15.57%  github.com/go-git/go-git/v5/plumbing/format/packfile.(*Scanner).copyObject

@shlomi-dr
Copy link

Hi, any updates on this? I may be experiencing something similar..

@axetroy, did you happen to find a way around that?

@tooptoop4
Copy link

i am having same OOM issue

@pjbgf
Copy link
Member

pjbgf commented Nov 10, 2022

Please give it a try with the latest version on master, as recent changes (9490da0 and 123cdde) were made which should improve memory usage.

@piotrkowalczuk
Copy link

I'm currently working on a program that pulls hundreds of repositories. I'm experiencing a memory leak. It could be related to the issue described here. My WIP code looks like this:

func produceCommits(ctx context.Context, cloneURL *url.URL) <-chan commitProduct {
	out := make(chan commitProduct, 1)
	done := func(err error) {
		out <- commitProduct{
			Err: err,
		}
	}

	dir := tmp.ForURL(cloneURL)
	go func() {
		defer func() {
			// TODO: bring back?
			if err := os.RemoveAll(dir); err != nil {
				done(err)
			}
			close(out)
		}()

		buf := bytes.NewBuffer(nil)
		repo, err := git.PlainCloneContext(ctx, dir, false, &git.CloneOptions{
			URL:      cloneURL.String(),
			Progress: buf,
			// NoCheckout: true,
		})
		if err != nil {
			var ert *fs.PathError
			switch {
			case errors.As(err, &ert):
				done(ert)
				return
			case errors.Is(err, git.ErrRepositoryAlreadyExists):
				repo, err = git.PlainOpenWithOptions(dir, &git.PlainOpenOptions{})
				if err != nil {
					done(err)
					return
				}
			case errors.Is(err, git.ErrRemoteExists):
			default:
				done(err)
				return
			}
		}

		iter, err := repo.CommitObjects()
		if err != nil {
			done(err)
			return
		}

		err = iter.ForEach(func(commit *object.Commit) error {
			select {
			case <-ctx.Done():
				return ctx.Err()
			default:
			}

			wt, err := repo.Worktree()
			if err != nil {
				return err
			}

			if err := wt.Checkout(&git.CheckoutOptions{Hash: commit.Hash, Force: true, Keep: false}); err != nil {
				if errors.Is(err, git.NoErrAlreadyUpToDate) {
					return nil
				}

				return err
			}

			out <- commitProduct{
				Hash: commit.Hash,
				Dir:  dir,
			}

			return nil
		})
		if err != nil {
			done(err)
			return
		}
	}()

	return out
}

Profiles bellow shows a diff between heap allocation gathered during a short and a long run.

image image image

@shlomi-dr
Copy link

shlomi-dr commented Jan 9, 2024

Since this were never solved, I had to revert to cloning via executing the git command itself from the app.. Very unfortunate.

Copy link

github-actions bot commented Apr 9, 2024

To help us keep things tidy and focus on the active tasks, we've introduced a stale bot to spot issues/PRs that haven't had any activity in a while.

This particular issue hasn't had any updates or activity in the past 90 days, so it's been labeled as 'stale'. If it remains inactive for the next 30 days, it'll be automatically closed.

We understand everyone's busy, but if this issue is still important to you, please feel free to add a comment or make an update to keep it active.

Thanks for your understanding and cooperation!

@github-actions github-actions bot added the stale Issues/PRs that are marked for closure due to inactivity label Apr 9, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 17, 2024
@axetroy
Copy link
Author

axetroy commented May 17, 2024

The ultimate solution to this issue is to use the git command.

Each clone is an independent process, and the process exits after completion. There is no need to worry about memory leaks, and the git command is much faster than go-git.

// 拉取代码
func (p *Puller) Pull(ctx context.Context) error {
	p.logger.Printf("Pulling %s\n", p.options.Repo)

	p.logger.Printf("拉取项目到目录 %s\n", p.options.Dir)

	switch {
	// 按 Hash 克隆
	case p.options.Hash != nil && *p.options.Hash != "":
		if err := util.EnsureDir(p.options.Dir); err != nil {
			return errors.WithStack(err)
		}

		if err := shell.Run(ctx, "git init", p.options.Writer, p.options.Dir); err != nil {
			return errors.WithStack(err)
		}

		if err := shell.Run(ctx, fmt.Sprintf("git fetch --no-tags --recurse-submodules --depth 1 --no-write-fetch-head %s %s", p.options.Repo, *p.options.Hash), p.options.Writer, p.options.Dir); err != nil {
			return errors.WithStack(err)
		}

		if err := shell.Run(ctx, fmt.Sprintf("git checkout %s", *p.options.Hash), p.options.Writer, p.options.Dir); err != nil {
			return errors.WithStack(err)
		}
	// 按 Tag 克隆
	case p.options.Tag != "":
		fallthrough
	// 按分支克隆
	case p.options.Branch != "":
		var target string

		if p.options.Tag != "" {
			target = p.options.Tag
		} else {
			target = p.options.Branch
		}

		parentFolder := filepath.Dir(p.options.Dir)
		folderName := filepath.Base(p.options.Dir)

		if err := os.RemoveAll(p.options.Dir); err != nil {
			return errors.WithStack(err)
		}

		if err := shell.Run(ctx, fmt.Sprintf("git clone --single-branch --depth 1 --no-tags --progress --branch %s %s %s", target, p.options.Repo, folderName), p.options.Writer, parentFolder); err != nil {
			return errors.WithStack(err)
		}
	default:
		parentFolder := filepath.Dir(p.options.Dir)
		folderName := filepath.Base(p.options.Dir)

		if err := util.EnsureDir(parentFolder); err != nil {
			return errors.WithStack(err)
		}

		if err := shell.Run(ctx, fmt.Sprintf("git clone %s %s", p.options.Repo, folderName), p.options.Writer, parentFolder); err != nil {
			return errors.WithStack(err)
		}
	}

	if p.options.Recurse {
		if err := shell.Run(ctx, "git submodule update --init --recursive --single-branch --depth 1 --checkout --force --progress", p.options.Writer, p.options.Dir); err != nil {
			return errors.WithStack(err)
		}
	}

	// 更新子模块
	if p.options.Submodule != nil {
		for submodule, config := range *p.options.Submodule {
			if !config.Latest {
				continue
			}

			if err := shell.Run(ctx, fmt.Sprintf("git submodule update --remote --recursive --depth 1 --progress %s", submodule), p.options.Writer, p.options.Dir); err != nil {
				return errors.WithStack(err)
			}
		}
	}

	return nil
}

@shlomi-dr
Copy link

Yup.. I unfortunately had to abandon using go-git for the most part.. If I have to bundle git executable in my container anyway just to clone, I might as well do the rest with it too, mostly faster and takes less ram. I wish this wasn't the case, but it is what it is 🤷‍♂️

luissimas added a commit to luissimas/zettelkasten-exporter that referenced this issue Jun 25, 2024
The current implementation of go-git has a high memory cost. Since our
needs are not that complex, we can use the default git CLI.

Reference:
go-git/go-git#315
luissimas added a commit to luissimas/zettelkasten-exporter that referenced this issue Jun 25, 2024
The current implementation of go-git has a high memory cost. Since our
needs are not that complex, we can use the default git CLI.

Reference:
go-git/go-git#315
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Issues/PRs that are marked for closure due to inactivity
Projects
None yet
Development

No branches or pull requests

6 participants