Skip to content

RFE: expose delegated metadata to client application #1995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jku opened this issue May 4, 2022 · 16 comments
Open

RFE: expose delegated metadata to client application #1995

jku opened this issue May 4, 2022 · 16 comments

Comments

@jku
Copy link
Member

jku commented May 4, 2022

EDIT: The overall issue is described in detail in https://docs.google.com/document/d/1rWHAM2qCUtnjWD4lOrGWE2EIDLoA7eSy4-jB66Wgh0o . The suggestion here is roughly the Metadata role (file) as search index solution in the document.

Assume a setup like this (this is what we expect a community artifact repository like PyPI to look like if it uses developer signatures with TUF):

  • a specific project/product team controls a delegated metadata
  • TUF clients want to know details of all of the artifacts in this metadata (to e.g. figure out which versions of an artifact are available)

Currently there is no way for the client application to get the whole metadata content from ngclient. We could provide a call much like get_targetinfo() that instead of the TargetFile would return the Targets object where the target search ended:

def get_targets_metadata(target_path: str) -> Targets
    """returns a Targets object of the metadata where the search for target_path terminated"""

This is not applicable to every TUF repo:

  • it requires a "contract" between repository and client: client has to know of a target_path that is delegated to the correct metadata -- in the pypi example it could be e.g. the PyPI project name
  • this is only useful if all "related" target files are listed in the same metadata

But with those assumptions the client can now easily get not just the list of target files it's interested in but also any custom metadata embedded in the targets metadata.

I've not thought through all the cases (what happens if there is no targetpath match? what if there is no terminating delegation?) but I think this is something we could consider implementing

@JustinCappos
Copy link
Member

I would expect that in the case that this target file delegates to other target files, it would include all of the items listed there. (This would continue transitively.) It would also need to handle special delegation cases, thresholds, etc. Is this your thinking as well?

@jku
Copy link
Member Author

jku commented May 5, 2022

I was not thinking that, no -- but that could work as well...

My original idea was to literally return the equivalent of the "signed" json object of the "final" targets metadata (the one that terminates delegation either by containing the targetpath or by terminating=True): this would require no extra processing on the client part, but as I mentioned is only useful if all "related" target files are listed in the same metadata. Your idea of building a list of target files while doing the depth first search through the delegation tree (and appending all target files iff the targets metadata happens to be part of the delegated portion of the tree) is certainly more complex but it is interesting as it removes the limitation of "one metadata for related target files" -- I would have to prototype to see if there are any unintended results there.

This is what I proposed:

def get_targets_metadata(target_path: str) -> Targets
    """returns a Targets object of the metadata where the search for target_path terminated"""

This what I think Justin is describing:

def get_all_targetinfos(target_path: str) -> List[TargetFile]:
    """Returns a list of all target files in all targets metadata that forms the delegating chain for 'target_path'"""

I don't think there's anything special to handle wrt threshold etc: if the delegations work for normal targetpath search, they should work for this.

@dennisvang
Copy link

dennisvang commented Jun 2, 2022

@jku If you are looking for another use case: it looks like our notsotuf client would also benefit from such a feature.

@jku
Copy link
Member Author

jku commented Jun 2, 2022

@dennisvang it may have been your comments some time ago that got me thinking about it :)

Btw if you have any feedback or suggestions on python-tuf 1.x from downstream perspective, that would be very welcome -- creating issue is fine or slack works too

@lukpueh
Copy link
Member

lukpueh commented Jun 7, 2022

This is what I proposed:

def get_targets_metadata(target_path: str) -> Targets
    """returns a Targets object of the metadata where the search for target_path terminated"""

This what I think Justin is describing:

def get_all_targetinfos(target_path: str) -> List[TargetFile]:
    """Returns a list of all target files in all targets metadata that forms the delegating chain for 'target_path'"""

I think this is a useful feature! Here are some unordered thoughts/questions on the two existing proposals:

  • Both solutions require a contract between repo and client that the function yields all/only "related" target file infos.
  • Solution 2 seems more general/flexible as it can cover cases, where related target file infos are spread across multiple target metadata files AND where they are all in the last target file. This seems like an advantage.
  • Is solution 2 more prone to also serve unrelated target file infos? Probably not / depends on the "contract".
  • Why does solution 1 return a full Targets object and solution 2 a list of only TargetFiles?
  • Do we need the full Targets metadata in either case?
  • Is target_path here the same as in get_targetinfo or can it also be a path prefix or path pattern?
  • If target_path can be only a part of the path or a pattern, then the delegation tree might not resolve in the same way as the individual "related" files would.

@dennisvang
Copy link

@jku issue #822 looks related.

@jku
Copy link
Member Author

jku commented Jun 8, 2022

Yes it definitely is related. Searching is still a very complex beast and we shouldn't think this will actually solve that problem completely: I don't think this library even can solve searching in general: it really is an application problem. But we could provide this functionality so that repositories can design their content so that this functionality can be used for specific types of searches.

@jku
Copy link
Member Author

jku commented Jun 15, 2022

Forgot to respond to lukas here:

Is solution 2 more prone to also serve unrelated target file infos? Probably not / depends on the "contract".

Yeah, there's certainly a chance of an earlier targets metadata to contain "unrelated" files that get listed (in the same sense that it allows multiple metadata to contain the "related" files). This is what makes the two approaches different...

Why does solution 1 return a full Targets object and solution 2 a list of only TargetFiles?
Do we need the full Targets metadata in either case?

solution 1 returns Targets just because it can -- I figured this would allow e.g. custom fields in the Targets to be available to client. In the second option it's not as simple.

I don't know of a specific need for Targets.

Is target_path here the same as in get_targetinfo or can it also be a path prefix or path pattern?

I think it has to be the same thing: an explicit targetpath that in this case is just used to find "all targetfiles in the chain of delegations for this targetpath" (or "...in the last delegation for this targetpath" for solution 1). It's a bit unintuitive but could be useful...

@jku
Copy link
Member Author

jku commented Jun 15, 2022

Do we need the full Targets metadata in either case?

Oh and the opinion I forgot: I maybe lean towards the List[TargetFile] return value anyway regardless of solution. ngclient public API already includes TargetFile, but does not currently expose the Signed-derivatives or other Metadata API details: I like that split

@dennisvang
Copy link

The List[TargetFile] option would be sufficient for our specific use-case.

Currently, we base our search on the target_path values obtained from tuf.ngclient.updater.Updater._trusted_set.targets.signed.targets.keys(), although I'm not sure that would always work.

jku pushed a commit to jku/repository-playground that referenced this issue Jun 30, 2022
Remove index.json from the repository design: instead make the client
parse the targetpaths included in targets metadata. This is advantageous
because
* the repository is now simpler (no need to keep index.json and
  targetpaths in sync)
* client no longer needs to do an additional get_targetinfo() and
  download_target() to find the index file

The downside is that until
theupdateframework/python-tuf#1995 is fixed,
the client has to access ngclient internals.

Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
jku pushed a commit to jku/repository-playground that referenced this issue Jun 30, 2022
Remove index.json from the repository design: instead make the client
parse the targetpaths included in targets metadata. This is advantageous
because
* the repository is now simpler (no need to keep index.json and
  targetpaths in sync)
* client no longer needs to do an additional get_targetinfo() and
  download_target() to find the index file

The downside is that until
theupdateframework/python-tuf#1995 is fixed,
the client has to access ngclient internals.

Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
jku pushed a commit to jku/repository-playground that referenced this issue Jun 30, 2022
Remove index.json from the repository design: instead make the client
parse the targetpaths included in targets metadata. This is advantageous
because
* the repository is now simpler (no need to keep index.json and
  targetpaths in sync)
* client no longer needs to do an additional get_targetinfo() and
  download_target() to find the index file

The downside is that until
theupdateframework/python-tuf#1995 is fixed,
the client has to access ngclient internals.

Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
@jku
Copy link
Member Author

jku commented Sep 22, 2022

Assume a setup like this (this is what we expect a community artifact repository like PyPI to look like if it uses developer signatures with TUF):

  • a specific project/product team controls a delegated metadata
  • TUF clients want to know details of all of the artifacts in this metadata (to e.g. figure out which versions of an artifact are available)

Currently there is no way for the client application to get the whole metadata content from ngclient.

After discussing with @kairoaraujo we realised that using hashbin delegation anywhere in the delegation chain breaks this idea. Because the hashing happens over the complete artifact targetpath (and not some policy object like "project name") we can't possibly list all targets related to a project or find out the current version of a product.

This is just a side effect of TUF not really understanding concepts like project, product or version: everything is an independent artifact in TUF. There are multiple questions this architecture (when using hashed bins) can't solve without additional data:

  • what is the newest version of product X?
  • which versions exist for product X?
  • what products are owned (signed) by project Y?

At least the first one is a question all package repository clients want to answer. Maybe larger repositories just are going to need an additional layer to handle that (and to store the project/product/version mapping in TUF target files to secure that info, just like PEP-458 currently does)...

This leads to another question: if you have to include more structured data about your artifacts in TUF already, why not include the TARGETINFO data there already -- I mean the download URL and hashes. why would you list those artifacts separately in TUF metadata and force your clients to do two round trips?

@lukpueh
Copy link
Member

lukpueh commented Oct 3, 2022

This leads to another question: if you have to include more structured data about your artifacts in TUF already, why not include the TARGETINFO data there already -- I mean the download URL and hashes. why would you list those artifacts separately in TUF metadata and force your clients to do two round trips?

A simple answer: To allow standardized target file verification without the need for concepts like project, product or version.

@jku
Copy link
Member Author

jku commented Oct 3, 2022

but I am talking about the case where project, product and version are needed by the client code to even find the final target it wants to download: the reality is that this approach of listing targets separately (in the case where the client needs the extra structured data anyway like pip does) leads to more complex client code, larger metadata files and the additional server roundtrip for every download, as seen in the pip prototypes...

Even with the app-specific-structured-data client could still use Updater.download_target() to verify the final targets: the only thing it needs to do is extract the correct TARGETINFO data from the application specific structured data.

@jku jku changed the title RFE: expose delegating metadata to client application RFE: expose delegated metadata to client application Oct 7, 2022
@jku
Copy link
Member Author

jku commented Oct 12, 2022

The issue is described in detail in https://docs.google.com/document/d/1rWHAM2qCUtnjWD4lOrGWE2EIDLoA7eSy4-jB66Wgh0o

The original suggestion in the issue description is roughly the Metadata role (file) as search index solution in the document.

@jku
Copy link
Member Author

jku commented Dec 2, 2022

I guess I should update current thinking on this.

I think exposing the metadata to clients as described has security implications that may mean this is not a good idea. The fact that a delegated roles metadata contains targetpaths does not mean that those targetpaths have been delegated to the role. So exposing the list as is seems wrong, even if this is documented as unsafe.

The only really safe way to do this would be to run the delegation lookup for each targetpath listed, and only expose it to client if the targetpath really is delegated to the role in question. This sounds a bit wasteful but in practice might work just fine: in usual cases this would not lead to new metadata downloads and all required metadata would already be loaded in memory.

jku added a commit to jku/python-tuf that referenced this issue Dec 2, 2022
WIP: This might work

Fixes theupdateframework#1995 ?

Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
@jku
Copy link
Member Author

jku commented Dec 6, 2022

Linking to my rough branch so it doesn't get lost: https://github.com/jku/python-tuf/commits/list-targets

  • needs tests
  • the delegated roles metadata (or even role name) is never exposed to client application in this approach
  • the original targetpath argument does not need to be an existing targetpath: the last handled delegated role is used in any case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants