-
Notifications
You must be signed in to change notification settings - Fork 278
RFE: expose delegated metadata to client application #1995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I would expect that in the case that this target file delegates to other target files, it would include all of the items listed there. (This would continue transitively.) It would also need to handle special delegation cases, thresholds, etc. Is this your thinking as well? |
I was not thinking that, no -- but that could work as well... My original idea was to literally return the equivalent of the "signed" json object of the "final" targets metadata (the one that terminates delegation either by containing the targetpath or by This is what I proposed: def get_targets_metadata(target_path: str) -> Targets
"""returns a Targets object of the metadata where the search for target_path terminated""" This what I think Justin is describing: def get_all_targetinfos(target_path: str) -> List[TargetFile]:
"""Returns a list of all target files in all targets metadata that forms the delegating chain for 'target_path'""" I don't think there's anything special to handle wrt threshold etc: if the delegations work for normal targetpath search, they should work for this. |
@jku If you are looking for another use case: it looks like our notsotuf client would also benefit from such a feature. |
@dennisvang it may have been your comments some time ago that got me thinking about it :) Btw if you have any feedback or suggestions on python-tuf 1.x from downstream perspective, that would be very welcome -- creating issue is fine or slack works too |
I think this is a useful feature! Here are some unordered thoughts/questions on the two existing proposals:
|
Yes it definitely is related. Searching is still a very complex beast and we shouldn't think this will actually solve that problem completely: I don't think this library even can solve searching in general: it really is an application problem. But we could provide this functionality so that repositories can design their content so that this functionality can be used for specific types of searches. |
Forgot to respond to lukas here:
Yeah, there's certainly a chance of an earlier targets metadata to contain "unrelated" files that get listed (in the same sense that it allows multiple metadata to contain the "related" files). This is what makes the two approaches different...
solution 1 returns Targets just because it can -- I figured this would allow e.g. custom fields in the Targets to be available to client. In the second option it's not as simple. I don't know of a specific need for Targets.
I think it has to be the same thing: an explicit targetpath that in this case is just used to find "all targetfiles in the chain of delegations for this targetpath" (or "...in the last delegation for this targetpath" for solution 1). It's a bit unintuitive but could be useful... |
Oh and the opinion I forgot: I maybe lean towards the |
The Currently, we base our search on the |
Remove index.json from the repository design: instead make the client parse the targetpaths included in targets metadata. This is advantageous because * the repository is now simpler (no need to keep index.json and targetpaths in sync) * client no longer needs to do an additional get_targetinfo() and download_target() to find the index file The downside is that until theupdateframework/python-tuf#1995 is fixed, the client has to access ngclient internals. Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
Remove index.json from the repository design: instead make the client parse the targetpaths included in targets metadata. This is advantageous because * the repository is now simpler (no need to keep index.json and targetpaths in sync) * client no longer needs to do an additional get_targetinfo() and download_target() to find the index file The downside is that until theupdateframework/python-tuf#1995 is fixed, the client has to access ngclient internals. Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
Remove index.json from the repository design: instead make the client parse the targetpaths included in targets metadata. This is advantageous because * the repository is now simpler (no need to keep index.json and targetpaths in sync) * client no longer needs to do an additional get_targetinfo() and download_target() to find the index file The downside is that until theupdateframework/python-tuf#1995 is fixed, the client has to access ngclient internals. Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
After discussing with @kairoaraujo we realised that using hashbin delegation anywhere in the delegation chain breaks this idea. Because the hashing happens over the complete artifact targetpath (and not some policy object like "project name") we can't possibly list all targets related to a project or find out the current version of a product. This is just a side effect of TUF not really understanding concepts like project, product or version: everything is an independent artifact in TUF. There are multiple questions this architecture (when using hashed bins) can't solve without additional data:
At least the first one is a question all package repository clients want to answer. Maybe larger repositories just are going to need an additional layer to handle that (and to store the project/product/version mapping in TUF target files to secure that info, just like PEP-458 currently does)... This leads to another question: if you have to include more structured data about your artifacts in TUF already, why not include the TARGETINFO data there already -- I mean the download URL and hashes. why would you list those artifacts separately in TUF metadata and force your clients to do two round trips? |
A simple answer: To allow standardized target file verification without the need for concepts like project, product or version. |
but I am talking about the case where project, product and version are needed by the client code to even find the final target it wants to download: the reality is that this approach of listing targets separately (in the case where the client needs the extra structured data anyway like pip does) leads to more complex client code, larger metadata files and the additional server roundtrip for every download, as seen in the pip prototypes... Even with the app-specific-structured-data client could still use |
The issue is described in detail in https://docs.google.com/document/d/1rWHAM2qCUtnjWD4lOrGWE2EIDLoA7eSy4-jB66Wgh0o The original suggestion in the issue description is roughly the Metadata role (file) as search index solution in the document. |
I guess I should update current thinking on this. I think exposing the metadata to clients as described has security implications that may mean this is not a good idea. The fact that a delegated roles metadata contains targetpaths does not mean that those targetpaths have been delegated to the role. So exposing the list as is seems wrong, even if this is documented as unsafe. The only really safe way to do this would be to run the delegation lookup for each targetpath listed, and only expose it to client if the targetpath really is delegated to the role in question. This sounds a bit wasteful but in practice might work just fine: in usual cases this would not lead to new metadata downloads and all required metadata would already be loaded in memory. |
WIP: This might work Fixes theupdateframework#1995 ? Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
Linking to my rough branch so it doesn't get lost: https://github.com/jku/python-tuf/commits/list-targets
|
EDIT: The overall issue is described in detail in https://docs.google.com/document/d/1rWHAM2qCUtnjWD4lOrGWE2EIDLoA7eSy4-jB66Wgh0o . The suggestion here is roughly the Metadata role (file) as search index solution in the document.
Assume a setup like this (this is what we expect a community artifact repository like PyPI to look like if it uses developer signatures with TUF):
Currently there is no way for the client application to get the whole metadata content from ngclient. We could provide a call much like
get_targetinfo()
that instead of the TargetFile would return the Targets object where the target search ended:This is not applicable to every TUF repo:
target_path
that is delegated to the correct metadata -- in the pypi example it could be e.g. the PyPI project nameBut with those assumptions the client can now easily get not just the list of target files it's interested in but also any custom metadata embedded in the targets metadata.
I've not thought through all the cases (what happens if there is no targetpath match? what if there is no terminating delegation?) but I think this is something we could consider implementing
The text was updated successfully, but these errors were encountered: