Commons:Structured data/Modeling

Show your love! Work in progress
You can help! Data modeling is a community effort. Improve the guidelines, provide (additional) examples, or propose other data models for community discussion. Visit the talk page to discuss this process in general.

This page contains an overview of how to model information (metadata) about files on Wikimedia Commons in Structured Data.

The basics: structured data for every Commons file

edit

The following structured data is relevant for every file on Wikimedia Commons. This structured data roughly corresponds with the information stored in the Information template, a general usage infobox template to describe files in wikitext.

Structured data to add Brief instructions In-depth instructions info about the data model in structured data
File caption(s) (multilingual) A (short) textual description of the file, in at least one language. Plain text; no Wiki markup or hyperlinks. Data modeling guidelines: File captions
Date Usually the date when the file was created; using a inception (P571) statement. Data modeling guidelines: Date
Source of the file Information about where the file was taken from. Is it the uploader's own work, was it uploaded from an external website,...? Typically using a source of file (P7482) statement. Data modeling guidelines: Source of the file
Creator Who created the file? Typically described with a creator (P170) statement. Data modeling guidelines: Creator of the file
Copyright status and license Is the file still under copyright, or is it public domain? If still under copyright, which license(s) applies/apply? Using copyright status (P6216) and copyright license (P275). Data modeling guidelines: Copyright and licenses

If the above structured data is added to a file, the file's wikitext description can be simplified as follows:

File (click to explore how it is described) Wikitext Main structured data
 
== {{int:filedesc}} ==
{{Information}}

== {{int:license-header}} ==
{{self|cc-by-sa-4.0}}

[[Category:Energica Ego]]
  1. File caption: Energica Ego at Fully Charged Europe 2022. (English)
  2. Date: inception (P571) 7 July 2022
  3. Source of the file: source of file (P7482) original creation by uploader (Q66458942)
  4. Creator of the file: creator (P170) Jan Ainali (Q23899609) / object of statement has role (P3831) photographer (Q33231)
  5. Copyright and licenses: copyright status (P6216) copyrighted (Q50423863) and copyright license (P275) Creative Commons Attribution-ShareAlike 4.0 International (Q18199165)

An overview of further structured data property statements, that are in active use can be found here: Commons:Structured data/Properties table

The specifics: case examples of common Commons files

edit

Own work upload directly to Commons

edit

To describe a simple {{Own}} work upload directly uploaded by the author or {{Self}}-licensed by the uploader:

  1. File caption: one or more short description(s) of the file + language
  2. Date: inception (P571), see Commons:Structured data/Modeling/Date
  3. Source of the file: source of file (P7482)original creation by uploader (Q66458942)
  4. Creator of the file: creator (P170) → "some value" to indicate the creator doesn't have a Wikidata item. Qualified with:
  5. Copyright and licenses: copyright license (P275) and copyright status (P6216), see Commons:Structured data/Modeling/Copyright

Upload from a platform like Panoramio, Geograph or Flickr

edit

To describe an upload directly uploaded from a platform: (Preferably all uploads were done by a tool or bot, for consistency)

  1. File caption: one or more short description(s) of the file + language
  2. Date: inception (P571), see Commons:Structured data/Modeling/Date
  3. Source of the file: source of file (P7482)file available on the internet (Q74228490) to indicate the source
  4. Creator of the file: creator (P170) → "some value" to indicate the creator doesn't have a Wikidata item. Qualified with:
  5. Copyright and licenses: copyright license (P275) and copyright status (P6216), see Commons:Structured data/Modeling/Copyright

For Flickr uploads please also see Commons:Flickypedia/Data Modeling

Pronunciation

edit
  1. Copyright and licenses: copyright license (P275) and copyright status (P6216), see Commons:Structured data/Modeling/Copyright
  2. Type: instance of (P31)pronunciation file (Q108167708)
  3. Language: language of work or name (P407) → e.g. French (Q150)
  4. Transcription: audio transcription (P9533) → "<verbatim>" to describe what is pronounced
  5. Recording date: recording date (P10135)
  6. Who recorded it: recordist (P10893)
  7. Who pronounced it: spoken by (P10894)
  8. IDs: e.g. Lingua Libre ID (P10369) → "<id>" to describe the source identifier if applicable

How to model more specific types of files

edit

How to model specific types of metadata

edit

Here, we look at specific types of metadata for a file:

GLAM

edit

In some cases, large-scale content contributions mainly originating from Galleries, Libraries, Archives, and Museums (GLAM) use more specific data models.

It is highly recommended that all file metadata also complies with the general, basic data modeling recommendations listed above. This will make sure that all data on Wikimedia Commons can be uniformly searched and queried across the entire platform.

Content specific properties may be added, like:

  1. The Metropolitan Museum of Art: The Met object ID (P3634) → "<description>"
  2. iNaturalist: iNaturalist observation ID (P5683) → "<id>"
  3. Digital Public Library of America → Please see: Commons:Digital Public Library of America/Modeling
  4. Biodiversity Heritage Library → Please see: Commons:Biodiversity Heritage Library/Modeling

Bots

edit

Some bots automatically populate SDC data based on metadata in Commons templates.

General remarks

edit