Skip to content

*: Initial support for opening sha256 repositories using ObjectID #1527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 31 commits into
base: v6-transport
Choose a base branch
from

Conversation

pjbgf
Copy link
Member

@pjbgf pjbgf commented Apr 23, 2025

Introduces a new plumbing.ObjectID which supports either SHA1 and SHA256 object formats, removing the need of using build tags for the sha256 support.

Follow-up PRs will expand support for other git features.

Relates to #899 #706.

pjbgf and others added 21 commits April 23, 2025 11:25
Signed-off-by: Paulo Gomes <paulo.gomes@suse.com>
Signed-off-by: Paulo Gomes <paulo.gomes@suse.com>
Signed-off-by: Paulo Gomes <paulo.gomes@suse.com>
The support for SHA256 will require the mapping between ObjectFormat and the
correct hash.Hash to be used.

Signed-off-by: Paulo Gomes <paulo.gomes@suse.com>
The introduction of both new types are a stepping stone to enable
SHA256 support concurrently - without the need for a build tag.

ImmutableHash provides a way to represent varied sized hashes (for SHA1
or SHA256) with a single type, while keeping similar performance of the
existing plumbing.Hash.

The names were picked in order to provide a clearer distinction on what
they do. ImmutableHash is the result of a hash operation and can't be
changed once calculated. ObjectHasher, adds the Git object header before
a normal hash operation that hash.Hash or plumbing/hash.Hash do.

The performance results shows that SHA1 operations could be slower,
however for SHA256 it can be over 3 times faster:

~Hasher/hasher-sha1-100B-16         	            2202226	     574.7 ns/op	 174.00 MB/s     272 B/op     7 allocs/op
~Hasher/objecthash-sha1-100B-16     	            1511851	     772.6 ns/op	 129.43 MB/s     272 B/op     9 allocs/op
~Hasher/objecthash-sha256-100B-16   	            5057584	     247.4 ns/op	 404.21 MB/s      96 B/op     7 allocs/op
~Hasher/hasher-sha1-5000B-16        	              96476	     12523 ns/op	 399.25 MB/s    5536 B/op     7 allocs/op
~Hasher/objecthash-sha1-5000B-16    	             105376	     10983 ns/op	 455.27 MB/s     272 B/op     9 allocs/op
~Hasher/objecthash-sha256-5000B-16  	             420741	      2828 ns/op	1767.77 MB/s      96 B/op     7 allocs/op
~HashFromHex/hasher-parse-sha1-valid-16         	23243548     64.65 ns/op	 618.69 MB/s      48 B/op     1 allocs/op
~HashFromHex/objecthash-fromhex-sha1-valid-16   	18120699     79.62 ns/op	 502.36 MB/s      72 B/op     2 allocs/op
~HashFromHex/objecthash-fromhex-sha256-valid-16 	 8969871    124.2 ns/op	     515.22 MB/s      96 B/op     2 allocs/op

Signed-off-by: Paulo Gomes <paulo.gomes@suse.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
The binary.ReadHash is now replaced with ReadFrom on the
ObjectID struct.

Signed-off-by: Paulo Gomes <pjbgf@linux.com>
With the addition of multi-hash support, there is no longer
need to have a sha256 specific test that relies on build tags.

Signed-off-by: Paulo Gomes <pjbgf@linux.com>
The new ObjectID replaces plumbing.Hash and supports both
SHA1 and SHA256. The backing array is sized after SHA256,
resulting in a larger memory footprint on a per hash basis.

Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
The previous example of Blame, started from a clone, instead of
opening an existing repository.

Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
@pjbgf pjbgf added this to the v6.0.0 milestone Apr 23, 2025
Copy link
Member

@aymanbagabas aymanbagabas Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plumbing/hash package feels redundant and can be replaced with Go's crypto and hash packages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason why plumbing.hash needs to exist currently is to support hash.RegisterHash within the explicit context of go-git. If we were to remove that, and lean on upstream's instead, then we would need to rethink on the UX.

I'm not sure we should force users to replace their sha1 implementation across their app, if they only want to use it in the context of go-git.
IIRC, upstream had a recent performance optimisations whereby they use both hashes: sha1cd when interacting with remote repositories, and sha1 for all local operations.

This feels like an orthogonal topic which could be dealt with as a separate issue/PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, don't we already do that when sha1cd gets imported? It replaces upstream's sha1 in the registry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right. That will need to change upstream depending on the approach we take.

I'll make a few tests around this, which may result in either the removal of plumbing/hash or the landing of the hybrid approach (if the performance gains justifies it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removal of the upstream behaviour will be handled separately, the same applies to the changes in go-git.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can remove plumbing/hash and combine the implementations of sha1 and sha256 hasher and object id. Also, instead of using interfaces for ObjectHasher, we can use a concrete type *ObjectHasher.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened pjbgf#27 against your branch 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removal of the upstream behaviour will be handled separately, the same applies to the changes in go-git.

I think that's fine and is expected. Ideally, go-git shouldn't directly import crypto hashers to avoid overriding the user's registered hashers. Users should be able to use custom hashers using the crypto register and interface, while go-git could provide a default import for sha1cd and crypto/sha256.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added your commits into the PR, PTAL.

There are some additional changes required around hashing and potentially the removal of our plumbing/hash, but I'd rather do that in a follow-up PR.

pjbgf added 3 commits April 26, 2025 13:02
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
ObjectID differs from the previous Hash as it has a new
format field. For the purposes for equality checking, both
values 'sha1' and '' should be perceived as the same, so
to avoid cases where the equality check is not being done
internally, the value 'sha1' is no longer used. The field
format should only be populated when its value is 'sha256'.

Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
@pjbgf pjbgf marked this pull request as ready for review April 26, 2025 12:59
This commit changes the ObjectFormat type from a string to an int. This
would make a default ObjectFormat declaration default to SHA1. A string
function is defined to return the string representation of the ObjectFormat.
This combines the ObjectHasher implementation to be hash function
agnostic. The SHA1 and SHA256 implementations are now combined into a
single implementation that uses the given format to determine the
hash function to use.
aymanbagabas and others added 4 commits April 27, 2025 07:48
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Users can opt-out of sha1cd by adding an empty import
for the upstream implementation:
_ "crypto/sha1"

Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants