Commons:Biodiversity Heritage Library/Modeling

This page will be used to document BHL's data modeling for SDC statements. (Statements currently under discussion are marked in yellow, so that approvals can be changed to green and anything we decide against or want to put on hold can be changed to red).

Current data modeling work can also be viewed in the collaborative google doc here.

BHL Image Data Model - In Progress

edit
Wikidata Property Name Property Identifier Status Definition
BHL Page ID BHL page ID (P687) Required The numeric identifier associated with a bhl page e.g.
biodiversitylibrary.org/page/33665243
Collection collection (P195) Required art, museum, archival, or bibliographic collection the subject is part of
Copyright Status copyright status (P6216) Required As per Commons:Structured data/Modeling/Copyright, use public domain with qualifier applies to jurisdiction
United States of America
Copyright license copyright license (P275) Required As per Commons:Structured data/Modeling/Copyright, license under which this copyrighted work is released
Instance of instance of (P31) Required
Published In published in (P1433) Required The anatomy of plants: by Nehemiah Grew ...
sponsor (P859) Required The financial sponsor of the digitization of the item; not to be confused with Holding Insititution.
BHL creator ID BHL creator ID (P4081) Recommended identifier for a creator of a work in the Biodiversity Heritage Library (BHL)
RightsStatements.org statement RightsStatements.org statement according to source website (P6426) Recommended Rights statment controlled vocab; ACTION: JJ to discuss with Dominic about flavor of statement to use
Inception inception (P571) Recommended As per Commons:Structured data/Modeling/Date, mainly 4 digit - YYYY or
range using inception date qualifiers
start time - YYYY
end time - YYYY
Language language of work or name (P407) Recommended the language of the work
Creator creator (P170) Recommended As per Commons:Structured data/Modeling/Author, maker of this creative work or other object
Flickr Photo ID ‎Flickr photo ID (P12120) Recommended Flickr Photo ID
Flickr user ID Flickr user ID (P3267) Optional / If applicable Added by FlickypediaBackfillrBot
BHL item ID BHL item ID (P11959) DECIDE identifier for an item in the Biodiversity Heritage Library (BHL)
Caption n/a DECIDE Captions are meant to be simple and short. While descriptions can be very expansive, captions are limited to 255 characters in length
Depicts depicts (P180) DECIDE Ideas:
use subject in EXIF metadata?
use description?
some combo?
Description n/a DECIDE descriptions can be more expansive
Illustrator illustrator (P110) DECIDE If the artist/illustrator can be discerned
Internet Archive ID Internet Archive ID (P724) DECIDE mobot31753000008869
Taxon name taxon name (P225) DECIDE Do we want to use only canonical?
Taxon name taxon name (P225) DECIDE Do we want to use only canonical?
Taxon name taxon name (P225) DECIDE note: binomials will need to be split, cleaning required. Might need to do GBIF Species API treatment

Won't Map

edit

These are fields currently stored in the BHL Database that we have decided will not map to SDC

BHL Metadata Field Recommended Definition Note
StartIssue Don't map This data is NULL from the BHL DB; no need to map don't map
EndIssue Don't map This data is NULL from the BHL DB; no need to map don't map
FlickrTags - TagValue Don't map Is there a Flickr Tag property in the works?
SF -They currently are in the ""description"" field. - Machine tags for species names could be the depict statements. Use depict statement that points to the taxon.
Needs to be carefully curated by the community because there is so much different stuff in there.
FlickrURL Don't map persistent identifier for images on Flickr Already mapped to source by other image mapping exercises.
Data modeling conventions: https://commons.wikimedia.org/wiki/Commons:Structured_data/Modeling#Upload_from_a_platform_like_Panoramio,_Geograph_or_Flickr
https://commons.wikimedia.org/wiki/Commons:Structured_data/Modeling/Source
StartVolume Don't map Volume range for the item Is this data required, optional, etc; we also have the item ID? Meaningless to publisher, researcher. This was decision by the binder in the library.
EndVolume Don't map Volume range for the item example item with range: https://www.biodiversitylibrary.org/item/25495#page/1/mode/1up
DateImageAddedToFlickr Don't map Do we care?
PublicationStartDate Don't map If a series, then this is the date of first publication for a title Should we map this? Might be confusing for users? Publication range data can also be spotty in the BHL DB
SL - Use more granular item or page-level date instead
PublicationEndDate Don't map
Authors - Role Don't map JJ - I think author data should not be held at the image level but rather in Wikidata at the publication level
Authors - FullerForm Don't map JJ - I think author data should not be held at the image level but rather in Wikidata
Authors - StartDate Don't map JJ - I think author data should not be held at the image level but rather in Wikidata
Authors - EndDate Don't map JJ - I think author data should not be held at the image level but rather in Wikidata
FlickrNotes - NoteValue Don't map rarely is there data in these fields
FlickrNotes - FlickrAuthorID Don't map rarely is there data in these fields
FlickrNotes - FlickrAuthorName Don't map rarely is there data in these fields
FlickrNotes - CreationDate Don't map rarely is there data in these fields

Flickr Machine Tags

edit

We are currently reviewing user generated Flickr machine tags and will be integrating them in the final schema after a community decision process.