A lot of people are deeply concerned about misinformation.
People often come to believe in falsehoods as part of how they identify with a social group. Once established, false beliefs are hard to overcome. Beliefs are a shorthand we use in trying to make sense of the world.
Misinformation is often propagated in order to engender delusion, or a firmly-held belief that does not correspond with reality. Prominent examples of delusions include belief in a flat earth, the risk of vaccines causing autism, or that moon landing was staged.
Delusions – if sufficiently widespread or if promoted aggressively enough – can have a significant effect on the operation of our society, particularly when it comes to involvement in democratic processes.
Misinformation campaigns seek to drive these effects. For instance, promoting a false belief that immigrants are eating household pets might motivate the implementation of laws that lead to unjustifiable treatment of immigrants.
For some, the idea that technology might help with this sort of problem is appealing. If misinformation is the cause of harmful delusions, maybe having less misinformation would help.
The explosion in popularity and efficacy of generative AI has made the creation of content that carries misinformation far easier. This has sharpened a desire to build tools to help separate truth and falsehood.
A Security Mechanism
Preventing the promotion of misinformation can be formulated a security goal. We might set out one of two complementary goals:
- It must be possible to identify fake content as fake.
- It must be possible to distinguish genuine content.
Our adversary might seek to pass off fake content as genuine. However, an easier goal might be easier to achieve: the adversary only needs to avoid having their fake content being identified as fabrications.
Note that we assume that once a story is established as fake, most people will cease to believe it. That’s a big assumption, but we can at least pretend that this will happen for the purposes of this analysis.
In terms of capabilities, any adversary can be assumed to be capable of using generative AI and other tools to produce fake content. We also allow the adversary access to any mechanism used to distinguish between real and fake content[1].
Technical Options
Determining what is – or is not – truthful is not easy. Given an arbitrary piece of content, it is not trivial to determine whether it contains fact or fabrication. After all, if it were that simple, misinformation would not be that big a problem.
Technical proposals in this space generally aim for a less ambitious goal. One of two approaches is typically considered:
- Mark fake content as fake.
- Mark genuine content as genuine.
Both rely on the system that creates content knowing which of the two applies. The creator can therefore apply the requisite mark. As long as that mark survives to be read by the consumer of the content, what the creator knew about whether the content was “true” can be conveyed.
Evaluating these options against the goals of our adversary – who seeks to pass off fake content as “real” – is interesting. Each approach requires high levels of adoption to be successful:
-
If an adversary seeks to pass off fake content as real, virtually all fake content needs to be marked as such. Otherwise, people seeking to promote fake content can simply use any means of production that don’t add markings. Markings also need to be very hard to remove.
-
In comparison, genuine content markings might still need to be universally applied, but it might be possible to realize benefits when limited to specific outlets.
That makes markings on genuine content more appealing as a way to help counteract misinformation.
Attesting to Fakeness
If content (text, image, audio, or video) is produced with generative AI, it can maybe include some way to check that it is fake. The output of many popular LLMs often includes both metadata and a small watermark.
These indications are pretty useless if someone is seeking to promote a falsehood. It is trivial to edit content to remove metadata. Similarly, visible watermarks can be edited out of images.
The response to that is a form of watermarking that is supposed to be impossible to remove. Either the generator embeds markings in the content as it is generated, or the marking is applied to the output content by a specialized process.
A separate system is then provided that can take any content and determine whether it was marked.
The question then becomes whether it is possible to generate a watermark that cannot be removed. This paper makes a strong case for the negative by demonstrating the removal – and re-application – of arbitrary watermarks, is possible, requiring only access to the system that rules on whether the watermark is present.
Various generative AI vendors companies have implemented systems of markings, including metadata, removable watermarks, and watermarking that is supposed to be resistant to removal.
Furthermore, generative AI models have to be controlled so that people can’t generate their own content without markings. That is clearly not feasible, as much as some would like to retain control.
Even if model access could be controlled, it seems likely that watermarks will be removable. At best, this places the systems that apply markings in a escalating competition with adversaries that seek to remove (or falsify) markings.
Content Provenance
There’s a case to be made for the use of metadata in establishing where content came from, namely provenance. If the goal is to positively show that content was generated in a particular way, then metadata might be sufficient.
Provenance could work to label content as either fake or real. However, it is most interesting as a means of tracing real content to its source because that might be more feasible.
The most widely adopted system is C2PA. This system has received a lot of attention and is often presented as the answer to online misinformation.
An unpublished opinion piece that I wrote in 2023 about C2PA is highly critical. This blog is a longer examination of what C2PA might offer and its shortcomings.
How C2PA Works
The C2PA specification is long and somewhat complicated[2], but the basics are pretty simple:
Content is digitally signed by the entity that produced it. C2PA defines a bunch of claims that all relate to how the content was created.
C2PA binds attributes to content in one of two ways. A “hard” binding uses a cryptographic hash, which ensures that any modification to the content invalidates the signature. A “soft” binding binds to a perceptual hash or a watermark (more on that below).
The C2PA metadata includes a bunch of attributes, including a means of binding to the content, all of which are digitally signed.
An important type of attribute in C2PA is one that points to source material used in producing derivative content. For instance, if an image is edited, an attribute might refer to the original image. This is supposed to enable the tracing of:
- the original work, when the present work contains edits, or
- the components that comprise a derivative work.
What Might Work in C2PA
Cryptographic assertions that come from secured hardware might be able to help identify “real” content.
A camera or similar capture device could use C2PA to sign the content it captures. Provided that the keys used cannot be extracted from the hardware[3], an assertion by the manufacturer might make a good case for the image being genuine.
The inclusion of metadata that includes URLs for source material – “ingredients” in C2PA-speak[4] – might also be useful in finding content that contains a manufacturer signature. That depends on the metadata including accessible URLs. As any assertion in C2PA is optional, this is not guaranteed.
Where C2PA Does Not Deliver
The weaknesses in C2PA are somewhat more numerous.
This section looks in more detail at some aspects of C2PA that require greater skepticism. These are the high-level items only; there are other aspects of the design that seem poorly specified or problematic[5], but the goal of this post is to focus on the primary problem.
C2PA Soft Bindings
A soft binding in C2PA allows for modifications of the content. The idea is that the content might be edited, but the assertions would still apply.
As mentioned, two options are considered in the specification:
-
Perceptual hashing, which are non-cryptographic digests of content that are intended to remain stable when content is edited.
-
Watermarking, which binds to a watermark that is embedded in the content.
In an adversarial setting, the use of perceptual hashes is well-studied, with numerous results that show exploitable weaknesses.
Perceptual hashes are not cryptographic hashes, so they are often vulnerable to cryptanalytic attack. Collision and second preimage attacks are most relevant here:
-
Collision attacks – such as this one – give an adversary the ability to generate two pieces of content with the same fingerprint.
-
Second preimage attacks – such as implemented with this code – allow an adversary to take content that produces one output and then modify completely different content so that it results in the same fingerprint.
Either attack allows an adversary to substitute one piece of content for another, though the preimage attack is more flexible.
Binding to a watermark appears to be easier to exploit. It appears to be possible to extract a watermark from one piece of content and apply it to another. Watermarks are often able to be removed – such as the TrustMark-RM mode of TrustMark[6] – and re-applied. That makes it possible to extract a watermark from one piece of content and copy it – along with any C2PA assertions – to entirely different content.
C2PA Traceability and Provenance
One idea that C2PA promotes is that source material might be traced. When content is edited in a tool that supports C2PA, the tool embeds information about the edits, especially any source material. In theory, this makes it possible to trace the provenance of C2PA-annotated content.
In practice, tracing provenance is unlikely to be a casual process. Some publisher sites might aid the discovery of source material but content that is redistributed in other places could be quite hard to trace[7].
Consider photographs that are published online. Professional images are captured in formats like RAW that are unsuitable for publication. Most images are often transcoded and edited for publication.
To trace provenance, editing software needs to embed its own metadata about changes[8], including a means of locating the original[9].
Any connection between the published and original content cannot be verified automatically in a reliable fashion. A hard, or cryptographic, binding is immediately invalidated by any edit.
The relationship between edited and original content therefore cannot be validated by a machine. Something like a perceptual hash might be used to automate this connection. However, as we’ve already established, perceptual hashes are vulnerable to attack. Any automated process based on a perceptual hash is therefore unreliable.
At best, a human might be able to look at images and reach their own conclusions. That supports the view that provenance information is unlikely to be able to take advantage of the scaling that might come from machine validation.
C2PA and DRM
With a published specification, anyone can generate a valid assertion. That means that C2PA verifiers need some means of deciding which assertions to believe.
For hardware capture of content (images, audio, and video), there are relatively few manufacturers. For the claims of a hardware manufacturer to be credible, they have to ensure that the keys they use to sign assertions can only be used with unmodified versions of their hardware.
That depends on having a degree of control. Control over access to secret keys in specialized hardware modules means that it might be possible to maintain the integrity of this part of the system.
There is some risk of this motivating anti-consumer actions on the part of manufacturers. For example, cameras could refuse to produce assertions when used with aftermarket lenses. Or, cameras that stop producing assertions if they are repaired.
As long as modifying hardware only results in a loss of assertions, that seems unlikely to be a serious concern for many people. Very few people seek to modify hardware[10].
The need to restrict editing software is far more serious. In order for edits to be considered trustworthy, strict controls are necessary.
The need for controls would make it impossible for open source software to generate trustworthy assertions. Assertions could only be generated to cloud-based – or maybe DRM-laden – software.
Completely New Trust Infrastructure
The idea of creating trust infrastructure for authenticating capture device manufacturers and editing software vendors is somewhat daunting.
Experience with the Web PKI shows that this is a non-trivial undertaking. A governance structure needs to be put in place to set rules for how inclusions – and exclusions – are decided. Systems need to be put in place for distributing keys and for managing revocation.
This is not a small undertaking. However, for this particular structure, it is not unreasonable to expect this to work out. With a smaller set of participants than the Web PKI, along with somewhat lower stakes, this seems possible.
Alternative Trust Infrastructure Options
In discussions about C2PA, when I raised concerns about DRM, Jeffrey Yasskin mentioned a possible alternative direction.
In that alternative, attestations are not made by device or software vendors. Content authors (or editors or a publisher) would be the ones to make any assertions. Assertions might be tied to an existing identity, such as a website domain name, avoiding any need to build an entirely new PKI.
A simple method would be to have content signed[11] by a site that claims it. That immediately helps with the problem of people attempting to pass fake information as coming from a particular source.
The most intruiging version of this idea relies on building a reputation system for content. If content can then be traced to its source, the reputation associated that source can in some way be built up over time.
The key challenge is that this latter form changes from a definitive sort of statement – under C2PA, content is either real or not – to a more subjective one. That’s potentially valuable in that it encourages more active engagement with the material.
The idea of building new reputational systems is fascinating but a lot more work is needed before anything more could be said.
A Simpler Provenance
The difficulty of tracing, along with the problems associated with editing, suggests a simpler approach.
The benefits of C2PA might be realized by a combination of hardware-backed cryptographic assertions and simple pointers (that is, without digital signatures) from edited content to original content.
Even then, an adversary still has a few options.
Trickery
When facial recognition systems were originally built, researchers found that some of these could be defeated by showing the camera a photo[12].
Generating a fake image with a valid assertion could as simple as showing a C2PA camera a photograph[13]. The use of trick photography to create a false impression is also possible.
No Expectations
It is probably fair to say that – despite some uptake of C2PA – most content in existence does not include C2PA assertions.
Limited availability seriously undermines the value of any provenance system in countering misinformation. An attacker can remove metadata if people do not expect it to be present.
This might be different for media outlets that implement policies that result in universal – or at least near-universal – use of something like C2PA. Then, people can expect content produced by that outlet will contain provenance information.
Articles on social media can still claim to be from that outlet. However, it might become easier to refute that sort of false claim.
That might be reason enough for a media outlet to insist on implementing something like C2PA. After all, the primary currency in which journalistic institutions trade is their reputation. Having a technical mechanism that can support refutation of falsified articles has some value in terms of being able to defend their reputation.
The cost might be significant, if the benefits are not realized until nearly all content is traceable. That might entail replacing every camera used by journalists and outside contributors. Given the interconnected nature of news media, with many outlets publishing content that is sourced from partners, that’s likely a big ask.
A Lack of Respect for the Truth
For any system like this to be effective, people need to care about whether something is real or not.
It is not just about expectations, people have to be motivated to interrogate claims and seek the truth. That’s not a problem that can be solved by technical means.
Conclusion
The narrow applicability of the assertions for capture hardware suggests that a simpler approach might be better and more feasible. Some applications – such as in marking generated content – are probably ineffectual as a means of countering misinformation. The DRM aspect is pretty ugly, while not really adding any value.
All of which is to say that the technical aspects of provenance systems like C2PA are not particularly compelling.
We have to assume that people will need to be able to ask whether content is real or fake for the system to work. ↩︎
And – it pains me to say – it is not very good. I write specifications for a living, so I appreciate how hard it is to produce something on this scale. Unfortunately, this specification needs far more rigor. I suspect that the only way to implement C2PA successfully would be to look at one of the implementations. ↩︎
That’s a big “if”, though not implausible. Though hardware keys used in consumer hardware have been extracted, the techniques used for protecting secrets require considerable resources. That would only invalidate the signatures from a single manufacturer or limited product lines. C2PA might not be worth the effort. ↩︎
C2PA can also indicate generative AI ingredients such as the text prompt used and the details of the generative model. That’s not much use in terms of protecting against use of content for misinformation, but it might have other uses. ↩︎
For instance, the method by which assertions can be redacted is pretty questionable. See my post on selective disclosure for more on what that sort of system might need to do. ↩︎ ↩︎
TrustMark is one of the soft binding mechanisms that C2PA recognizes. It’s also the first one I looked into. I have no reason to believe that other systems are better. ↩︎
C2PA does not use standard locators (such as
https://
), defining a new URI scheme. That suggests that the means of locating source material is likely not straightforward. ↩︎I did not look into how much detail about edits is recorded. Some of the supporting material for C2PA suggests that this could be quite detailed, but that seems impractical and the specification only includes a limited set of edit attributes. ↩︎
C2PA also defines metadata for an image thumbnail. Nothing prevents this from including a false representation. ↩︎
This might be more feasible for images and video than for audio. Image and video capture equipment is often integrated into a single unit. Audio often features analog interconnections between components, which makes it harder to detect falsified inputs. ↩︎
Yes, we’ve been here before. Sort of. ↩︎
Modern systems use infrared or depth cameras that are harder to spoof so trivially, though not completely impossible: hardware spoofing and depth spoofing both appear to be feasible. ↩︎
C2PA has the means to attest to depth information, but who would expect that? Especially when you can redact any clues that might lead someone to expect it to be present[5:1]. ↩︎