Property talk:P244
Documentation
Library of Congress name authority (persons, families, corporate bodies, events, places, works and expressions) and subject authority identifier [Format: 1-2 specific letters followed by 8-10 digits (see regex). For manifestations, use P1144]
(?:n|nb|nr|no|ns|sh)(?:[4-9][0-9]|00|20[0-2][0-9])[0-9]{6}|
”: value must be formatted using this pattern (PCRE syntax). (Help)List of violations of this constraint: Database reports/Constraint violations/P244#Format, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P244#Single value, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P244#Unique value, SPARQL (every item), SPARQL (by value)
List of violations of this constraint: Database reports/Constraint violations/P244#Conflicts with P31, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P244#allowed qualifiers, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P244#Conflicts with P1144, search, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P244#Label in 'en' language, search, SPARQL
Pattern ^https?://id\.loc\.gov/authorities/names/([a-z]{1,2}[0-9]{8,10})$ will be automatically replaced to \1. Testing: TODO list |
This property is being used by:
Please notify projects that use this property before big changes (renaming, deletion, merge with another property, etc.) |
|
|
Format Summary
editCompeting Formats
editThe format of the LCCN is not decided yet. LNNC was used since 1800's and there are several competing standards for storing and displaying it. In the few wikidata pages that use this property there are two distinctive formats used.
Possible formats are:
- "Normalized LCCN" format used by Library of Congress and MARC
- This format is described at www.loc.gov. In case of Julius Caesar it would be "n79021400". Used by
- Library of Congress URL http://id.loc.gov/authorities/names/n79021400.html URL
- MARC/XML standard http://id.loc.gov/authorities/names/n79021400.marcxml.xml see "<marcxml:controlfield tag="001">n79021400</marcxml:controlfield>"
- Displayed by wikipedia Authority control templates, like at de:Gaius Iulius Caesar
- WorldCat http://worldcat.org/identities/lccn-n79022935 URL, One of the formats accepted by worldcat.org
- OCLC http://errol.oclc.org/laf/n79022935.html URL
- "Normalized LCCN" format with space(s) separating leading letter and the number, used by MADS and VIAF
- In case of Julius Caesar it would be "n 79021400". Used by:
- MADS/XML standard http://id.loc.gov/authorities/names/n79021400.madsxml.xml see "<mads:identifier type="lccn">n 79021400 </mads:identifier>"
- VIAF http://viaf.org/viaf/sourceID/LC%7Cn+79021400 VIAF website can be accessed with LCCN (see URL). VIAF pages display LCCN as "n 79021400"
- Format with "/" separating 3 parts of the number, used by Wikipadia Authority control templates in few dozen languages
- In case of Julius Caesar it would be "n/79/21400" (notice no leading "0" 3_rd segment. Used by:
- Authority control templates, like at de:Gaius Iulius Caesar. There are probably over 300k identifiers. The format was first introduced by de:Vorlage:Normdaten in 2009 and allows grater flexibility at creating identifiers in different formats. Symbol "/" is used because it is the only string separator which can be easily used in Wikipedia template "language", using titleparts parser function. At this point it is unclear if we need that flexibility, and if we do than Lua freed us from constraints of template "language".
- WorldCat http://www.worldcat.org/identities/lccn-n79-21400 URL. One of the formats accepted by worldcat.org requires non-normalized format easily constructed from "n/79/21400" format.
Votes for format #1 "n79021400"
editSupport Format #3 would be the easiest for us to keep, but I think that in the long term we should abandon that home-brewed format which is not used or recognized by anybody else. Technical limitations of the past made format #3 the only option in 2009, but I think we can safetly abandon it now. My only concern is how to avoid confusing users which will be filling that property: both bot-drivers and individual users. --Jarekt (talk) 15:16, 11 March 2013 (UTC)
Support With respect to Gymels explanations below. --Kolja21 (talk) 17:34, 11 March 2013 (UTC)
Support I second the arguments mentioned above. --Monsieurbecker (talk) 18:28, 11 March 2013 (UTC)
Support a) #2 sucks; b) the slashes of #3 conflict with some elements of the printed card form; c) providing a pattern for syntax checking on entry should be feasible (some time); d) URL usage seems to converge to this form and construction of differing links should be still possible; e) most displays use a contiguous block of digits (8 or 10 depending on type 'A' or 'B') and therefore the problem of inserting the right amount of zeroes at the right position within the number shouldn't pose itself too often. -- Gymel (talk) 14:10, 12 March 2013 (UTC)
Comment Since there are no objections, I'll change the description in the Wikidata:List of properties. --Kolja21 (talk) 02:04, 14 March 2013 (UTC)
- OK --Jarekt (talk) 14:34, 14 March 2013 (UTC)
Support, too. --Ricordisamoa 02:29, 14 March 2013 (UTC)
Votes for format #2 "n 79021400"
editVotes for format #3 "n/79/21400"
edit
How does the LCCN work?
editcopied from: Wikidata:Project chat#Proposal: Allow string (and may be other future types) properties to be displayed with formatting defined by a template. --Kolja21 (talk) 21:57, 10 March 2013 (UTC)
- According to www.loc.gov "LCCNs have three components: prefix, year, and serial number. The prefix is optional; if present, it has one to three lowercase alphabetic characters. (Prefixes are maintained in a controlled list.) The year is two or four digits. (For 2000 and earlier the year is two digits, for 2001 and later, four digits.) The serial number (after normalization) is six digits. A normalized LCCN is a character string eight to twelve characters in length." Than the site lists the algorithmic rules on how to normalize various forms of LCCNs. Current library of congress site uses normalized LCCN on their website, see en:Template:Authority control/LCCN. However WORLDCAT database which is also using LCCN is using a different form, see en:Template:Authority control/WORLDCAT-LCCN. That is why last time it was discussed community decided on keeping LCCN number as a triplet of codes (separated by "/" ) normalize it on the fly to access www.loc.gov and assemble it in other combinations to access other sites that use LCCN as a key. That does not mean that we need to decide the same. --Jarekt (talk) 16:19, 9 March 2013 (UTC)
copy end
For the record, the conversation above has been selectively copied here. I have argued that slashes should not be stored in this property. Ironically the selective copying does not even support Kolja21's desire to have the slashes included. (After all, the above paragraph concludes, "That does not mean that we need to decide the same.")
Kolja21, in response to your talk page message saying that you reverted my example at WD:P, here are some URLs to sites of importance that "work" without slashes, hyphens, or anything. That is, they work with the identifier given by LCCN (for Van Gogh in this example, n79022935) with no extra "punctuation": http://worldcat.org/identities/lccn-n79022935, http://errol.oclc.org/laf/n79022935.html, http://id.loc.gov/authorities/names/n79022935.html. Moreover, does this URL with slashes "work"?: http://id.loc.gov/authorities/names/n/79/22935.html. No.
Here is a document titled Structure of the LC Control Number. There is absolutely no reference to slashes. Here is a document on the "normalization" of LCCN identifiers [1], which doesn't even include non-normalized examples with slashes (except in a different context, where they are also removed).
Here is an excerpt from the English Wikipedia article on LCCN: "The hyphen that is often seen separating the year and serial number is optional. More recently, the Library of Congress has instructed publishers not to include a hyphen." One of the slashes that you are supporting is equivalent to the hyphen that is mentioned in that excerpt. You will note that even the hyphen is a) "optional" and b) now discouraged.
I will not argue this triviality further, but if I'm wrong, I need something more than arguments from appeal to how Wikipedia stores this string for its own use, to convince me. It is not advisable to re-format canonical identifiers to suit one particular use (Wikipedia templates). Thank you. Espeso (talk) 22:41, 10 March 2013 (UTC)
- For the records: Your interpretation ("Kolja21's desire to have the slashes included") is wrong. I'm happy with both formats, but I want to make sure that editors know how to add the LCCN. If Wikipedia and Wikidata are using the LCCN in a different way, that is an important issue. If you are making errors (changing LCCN n/79/22935 to n7922935), of cause other might do the same. Since there was no consensus in the discussion mentioned above we can make a polling here. --Kolja21 (talk) 00:32, 11 March 2013 (UTC)
Some more pointers
editThe document The LCCN Namespace from 2003 cited above IMHO has to be seen in the context of the lccn.info URI scheme (cf. http://info-uri.info/registry/OAIHandler?verb=GetRecord&metadataPrefix=reg&identifier=info:lccn/ ) which - like all .info schemes - nowadays should be considered as only of historic interest.
Part of the official MARC21 documentation are the following documents dealing extensively with the form of the LCCN ("CN" used to stand for "Call Number" but is now "Control Number"): MARC Bibliographic and MARC Authority. The document Structure of the LC Control Number (last revision 2006) is linked from the MARC Documentation page an appears to combine these.
This means that the official LCCNs still are exclusively of the form ('#' depicts blanks) "n##79051955#", i.e. in comparision to the "printed card form" "n79-51955 the different components are blank-padded to the right (alphabetic prefix, supplement number) and zero-padded to the left (serial number) and one has to know the distinction between the types 'A' (years up to and including 2000) and 'B' (years from and including 2001) to get it right (result is a string of always 12 characters). Clearly the double spaces and trailing spaces of the official form of LCCNs will not only be a problem for human editors but also for generic software processing the numbers.
I checked the XML schema and crosswalks for MARCXML, MADS, and MODS and did not find any constraints, just the instruction "copy the number". Whalt Whitman at id.loc.gov demonstrates this:
<mads:identifier type="lccn">n 79081476 </mads:identifier>
Fortunately enough the "sane" form of the LCCN as expressed in the Namespace document and utilized on VIAF (for linking: display delivers a zero-padded form with one intermediate blank), id.loc.gov (avoids presenting the number), lccn.loc.gov (shows the official form "n##79081476#" with three blanks for the Whitman example), worldcat.org(?) seems to become ever more widely accepted for linking and URL construction. At this point of time however it does not seem to be an official form. This may change suddenly as soon as the library of congress redeclares id.loc.gov or lccn.loc.gov-URLs from "convenience URLs for services" to official URIs for authority records (or better even: introduces the documented format as official "web-friendly alternative") but this has yet to happen.
For the time being I consider the form "n79081476" not different from the form with parsing hints "n/79/81476" known from the various templates: Both are convenient and friendly to processing, and both are neither official nor a direct reproducion of official presentations of the authority number. – The preceding unsigned comment was added by Gymel (talk • contribs). 12:14, 11 March 2013 (UTC)
- Gymel, thanks for this great summary. --Jarekt (talk) 13:47, 11 March 2013 (UTC)
None of the above? :)
editFrom a data modeling perspective, an LCCN is a complex data type, not an atomic one. As detailed above, it consists of 3 elements. As such, it should be stored as a complex data type--serializing it into a string is basically a hack and it hides semantics. And isn't semantics what we're all about here? :) I realize Wikidata presently doesn't support complex data types, and as far as I've seen from the data model, data types won't be extensible by users, only by developers. On the one hand, that's a shame, but on the other, perhaps what we should use as user-created data types are Wikidata items themselves? So, basically, I am suggesting that LCCNs be stored as items with their components specified by properties, at least for the time being. (This seems to be an issue common to all complex data types, there should probably be a central discussion about it somewhere.) Silver hr (talk) 23:00, 11 March 2013 (UTC)
- Well, LCCNs here are used as identifiers and in semantics as in Semantic Web there is a strong opinion for them to be opaque. The official form differs from the form commonly used in URLs and URIs and only in order to convert one form into the other one has to know the internal structure. The official form has so many issues with respect to whitespace normalization (you can't even reliably copy&paste it!) that make it compeletely impractical to use. The convenience form for usage in links or URIs is usually not presented to users, therefore it has to be extracted from URLs or can be deduced from the form presented on the web site at hand by applying the proper algorithm. Depending on what form you are presented the most naive conversion ("simply omit all punctuation and blanks") may get an illegal number and the variant "LCCN encoding with structural markup" known from the various authority control templates can be considered as kind of captcha and transports the meaning: "This entry was made by a LCCN-internal-structure-aware human or process and therefore has a slightly higher probability to be syntactically correct". -- Gymel (talk) 08:03, 12 March 2013 (UTC)
- I feel like for simplicity sake we should use the simplest usable form, unless there is some reason to believe that we might have a need for atomic pieces. If string form is a hack, it is Library of Congress's hack since they developed it and are using it as their internal identifier. --Jarekt (talk) 13:50, 12 March 2013 (UTC)
- If the purpose of Wikidata is to be a backing data store for Wikipedia, then it makes sense to treat LCCNs as opaque identifiers and simply store them as strings, just as with any other identifier. But from what I gather, the long-term goal of Wikidata is to be a global semantic data repository in its own right. With that in mind, LCCNs and other complex data types have to be acknowledged as such. For better or worse, an LCCN is by design not a mere serial number, it has internal semantics, which to some future user of Wikidata might be relevant, even if it is not relevant for the purpose of storing data from Wikipedia infoboxes.
- As a side note, from what I gather, the closest thing to an official LCCN format is the normalized/canonical form. What you refer to as official form seen here seems to me to be the format for a MARC record, of which an LCCN is only one part.
- Silver hr (talk) 00:16, 13 March 2013 (UTC)
- As of yesterday, templates can now call functions written in Lua. It would be quite simple to write Lua parser to split normalized LCCN into smaller components. --Jarekt (talk) 17:56, 14 March 2013 (UTC)
Alternative
editWhy not make that two or three or four separate properties ("LCCN with slashes", "LCCN normalized" etc.), and let Bots do the adding of the "other" forms once one is entered? (As long as Wikidata doesn't have proper handling of more complex formats). --89.244.173.70 06:53, 13 March 2013 (UTC)
- I war thinking about it, but each time you store the same exact data in multiple forms than you have to define what to do with cases there there are incompatible entries. Than you need the whole infrastructure of synchronizing them and detecting conflicts. The best approach is to keep minimal amount of data and use algorithms to extract other forms, and you can derive "LCCN with slashes" from "LCCN normalized" and vice-verse. --Jarekt (talk) 11:52, 13 March 2013 (UTC)
- I agree with Jarekt, that's a bad idea from a data modeling perspective. Any time you have a single datum recorded in multiple places you're setting yourself up for trouble, such as the potential for one copy to change and thus be inconsistent with the others. Plus, it's more maintenance work, even if it's done by bots, and there isn't anything to be gained really. (Further reading: w:Data redundancy.) Silver hr (talk) 14:58, 14 March 2013 (UTC)
format template at top in plain language not regex jargon
editCan someone please do something with the formatting template at the lead. It is close to utter nonsense with its jargon. Only those familiar with regex could interpret it and our instructions should clearly not be aimed at regex-aware contributors. All that is going to do is to scare off people. — billinghurst sDrewth 06:04, 17 May 2013 (UTC)
- It's primarily aimed at the bot. You could add a plain text description at "allowed values" in property documentation template. -- Docu at 06:48, 17 May 2013 (UTC)
- a description is in http://www.loc.gov/marc/lccn-namespace.html. pasted it here and changed the format constraint pattern, it failed to match 18020208 in Q161531 --Akkakk 17:34, 28 June 2013 (UTC)
LCCN 000000000
editHi! I found at [2] a page with LCCN 000000000. Is it posible to add such caludation everywhere. This might be solved with a blacklist imlementation. Regards לערי ריינהארט (talk) 06:59, 22 October 2013 (UTC)
- Property talk:P214#blacklistVIAF Property talk:P214#new constraint violations about deprecated VIAF identifiers is about handling deprecated VIAF identifiers using a future (property) identifier blacklist. לערי ריינהארט (talk) 21:36, 1 November 2013 (UTC)
- see also Property talk:P1003#blacklistNLR about blacklisting NLR 00000000021:47, 1 November 2013 (UTC)
first General Punctuation character detected at property LCCN identifier
editfrom: Wikidata:Project chat:first General Punctuation character detected and Talk:Q1162700
- Please do not delete the first LCCN (at Daniel Shays (Q1162700) ). The generated url is http://lccn.loc.gov/n79114009%E2%80%8F with an ending %E2%80%8F ...
Regards לערי ריינהארט (talk) 04:47, 1 November 2013 (UTC)
- Almost all "Format" violations at Wikidata:Database reports/Constraint violations/P244 should be fixed now (except non AC related LCCN numbers for books see section below). לערי ריינהארט (talk) 00:42, 2 November 2013 (UTC)
non AC related LCCN numbers for books
editHi! at "Format" violations from Wikidata:Database reports/Constraint violations/P244 I found:
- The World's Navies (Q15052894): 78-65755
- Passenger Airliners of the United States 1926-1986 (Q15053275): 86-90432
- The World's Air Forces (Q15056983): 78-65756
These entries where generating invalid links. I changed as follows:
- The World's Navies (Q15052894): 79317893
- Passenger Airliners of the United States 1926-1986 (Q15053275): 86062547
- The World's Air Forces (Q15056983): 79317894
In order to use Library of Congress authority ID (P244) only for AC related LCCN "numbers" a note should be added in the description.
I assume there is a need for a LCCN book number property at Wikidata. Regards לערי ריינהארט (talk) 00:42, 2 November 2013 (UTC)
- Interesting, for example, The World's Air Forces (Q15056983) has "Library of Congress Catalog Card Number 78-65756" printed directly in it. I looked at the link you changed, and in reality it is for another book (or at least another edition of the book), because the ISBN of LCCN 78-65756 is 0890092699, and the ISBN of 79317894 is 071537690X. Clearly, there are multiple editions of the book and the ISBN and LCCN numbers are different for each edition. When changing the LCCN number for the sake of the link, we need to be careful that we aren't actually giving it the LCCN of a different book or edition. Joshbaumgartner (talk) 04:13, 3 November 2013 (UTC)
- As said in the description "only for authority control" (not for books). BTW: ISBNs should only be linked to (single) editions, see Help:Sources. --Kolja21 (talk) 04:50, 3 November 2013 (UTC)
- So for those of us not steeped in the authority control world, should we not be putting the LCCN number printed in an edition in a LCCN statement when we add these items for use as sources? Joshbaumgartner (talk) 06:52, 3 November 2013 (UTC)
- @Joshbaumgartner: Exactly. The LCCN number printed in an edition should not be confused with an authority record for a (famous) work or an author. (Only works others have written about - like Hamlet or Orwell's 1984 - have an authority file.) --Kolja21 (talk) 22:49, 3 November 2013 (UTC)
- Awesome, thanks for the clarification! Sorry for any confusion I caused by adding those when I saw them, I'll leave it to those who know more! Joshbaumgartner (talk) 07:16, 4 November 2013 (UTC)
- @Joshbaumgartner: Exactly. The LCCN number printed in an edition should not be confused with an authority record for a (famous) work or an author. (Only works others have written about - like Hamlet or Orwell's 1984 - have an authority file.) --Kolja21 (talk) 22:49, 3 November 2013 (UTC)
- So for those of us not steeped in the authority control world, should we not be putting the LCCN number printed in an edition in a LCCN statement when we add these items for use as sources? Joshbaumgartner (talk) 06:52, 3 November 2013 (UTC)
- As said in the description "only for authority control" (not for books). BTW: ISBNs should only be linked to (single) editions, see Help:Sources. --Kolja21 (talk) 04:50, 3 November 2013 (UTC)
- I see that Kolja has answered as well. Please look at http://id.loc.gov/authorities/names/n82094035.html related to https://viaf.org/viaf/175794909/#Orwell,_George,_1903-1950._%7C_Nineteen_eighty-four a VIAF (work) identifier. No publisher is specified here. The German http://d-nb.info/gnd/4099325-5/about/html is linked here and the French IDP=cb119437788 http://catalogue.bnf.fr/ark:/12148/cb119437788/PUBLIC as well. Your book LCCN references do not start with an n.
- I will add these to Nineteen Eighty-Four (Q208460) . Regards לערי ריינהארט (talk) 05:25, 3 November 2013 (UTC)
- To make it clear: Authority control is mainly used for persons, but also for organisations, famous works (not single editions) or terms. For the complete list see Template:Entities. --Kolja21 (talk) 05:41, 3 November 2013 (UTC)
Single value vs pseudonym
editThe LCCN has separate files for pseudonyms. Example:
- Günter Wallraff (Q76529)
- Günter Wallraff, LCCN n50019462
- Hans Esser (pseudonym), LCCN n2012076120
Same case with Perlentaucher. Example:
- Peter Bieri (Q115630)
- Peter Bieri, name ID peter-bieri
- Pascal Mercier (pseudonym), name ID pascal-mercier
What do you think about the idea to add in these cases for #1 the rank "preferred" and for #2 the qualifier pseudonym (P742)? --Kolja21 (talk) 13:50, 19 January 2014 (UTC)
- BTW: Do we have qualifiers to distinguish between numeric and name IDs (see P866)? --Kolja21 (talk) 14:00, 19 January 2014 (UTC)
- There is another way: create additional item for pseudonym. Like Daniel Handler (Q1060636) and Lemony Snicket (Q458346). And link its using some property. — Ivan A. Krestinin (talk) 20:08, 19 January 2014 (UTC)
- Handler/Snicket is an exception since there are two WP articles. I think we shouldn't start separate items for pseudonyms. It would make things complicate and produce redundancy. --Kolja21 (talk) 00:47, 20 January 2014 (UTC)
- Double approach way (one if article exists and another if does not exist) is redundant too. Another side is item`s subject smearing. For example Günter Wallraff (Q76529), what this item describe? Is Hans Esser born in October 1 1942? How can pseudonym born? It can appear, not born. Need I add <instance of (P31)> pseudonym (Q61002) additionally to <instance of (P31)> human (Q5)? And etc... — Ivan A. Krestinin (talk) 09:58, 27 January 2014 (UTC)
- The fact that many Authority Control sources (eg GND, VIAF) have 2 entries for author-pseudonym should make us think harder about this. Just because the author and his pseudonym are the same person physically doesn't mean they are the same culturally. An author may have good reasons for writing under a pseudonym, WD shouldn't take this distinction away from him. Or he may use 2 different pseudonyms in different situations: then it's significant which work was "created" by which pseudonym. --82.118.248.245 08:05, 29 May 2015 (UTC)
- Handler/Snicket is an exception since there are two WP articles. I think we shouldn't start separate items for pseudonyms. It would make things complicate and produce redundancy. --Kolja21 (talk) 00:47, 20 January 2014 (UTC)
- There is another way: create additional item for pseudonym. Like Daniel Handler (Q1060636) and Lemony Snicket (Q458346). And link its using some property. — Ivan A. Krestinin (talk) 20:08, 19 January 2014 (UTC)
New label: LCNAF
editI've added the property Library of Congress Control Number (LCCN) (bibliographic) (P1144) and changed the label of P244 (description: "Library of Congress Name Authority File") from LCCN to the correct abbreviation LCNAF. Sorry for the inconvenience. --Kolja21 (talk) 23:08, 6 February 2014 (UTC)
- See also Property talk:P1144#Name and constraint problems. --Kolja21 (talk) 06:37, 22 February 2014 (UTC)
- Returned to this identifier, after so many recent variations <ugh> May I suggest that more complex labels are not particular helpful, and not the means to communicate. If we are to change it, can there please be a conversation and a consensus. Trying to find an expected label name and some certainty and continuity is helpful. — billinghurst sDrewth 03:26, 9 June 2017 (UTC)
- Grrrr, I remember that we had then went to LCAuth, which I remember. So I will progress to that. — billinghurst sDrewth 03:27, 9 June 2017 (UTC)
- The abbreviation 'LCAuth' doesn't seem to be used anywhere outside Wikidata and internally for English Wikipedia; could we find a better name? VIAF calls it 'Library of Congress/NACO'. The problem seems to be that we're combining the Library of Congress Names (officially called the NACO Authority File) and Library of Congress Subject Headings (SACO). I am not aware of there being an official name for this combination (I see none on the search page). 'LC/NACO/SACO ID' would most accurately reflect their provenance, but perhaps there is something better. 'Library of Congress authority ID'? AndrewNJ (talk) 15:37, 14 July 2017 (UTC)
- As there hasn't been any further discussion, I've changed the name to 'Library of Congress authority' as an expanded version of what is there right now and the closest thing I can find to an official name, but there are many other solutions to this, and I imagine someone can think of something better. AndrewNJ (talk) 10:28, 19 July 2017 (UTC)
- Returned to this identifier, after so many recent variations <ugh> May I suggest that more complex labels are not particular helpful, and not the means to communicate. If we are to change it, can there please be a conversation and a consensus. Trying to find an expected label name and some certainty and continuity is helpful. — billinghurst sDrewth 03:26, 9 June 2017 (UTC)
LCSH
editFrom the constraint pattern and actual usage it is clear that this property is also open to identifiers for Library of Congress Subject Headings under the SACO (Subject Authority Cooperative) programme:
- These are the identifiers / authority numbers starting with the letter s.
- id.loc.gov automatially redirects requests http://id.loc.gov/authorities/... to the appropriate databases http://id.loc.gov/authorities/names/... (LCNAF) and http://id.loc.gov/authorities/subjects/... (LCSH) depending on the prefix, thus there is no technical obstacle to the practice here.
- NACO and SACO try to keep their fields disjoint, i.e. usually there is either a LCNAF or a LCSH record for an item, not both. Thus there should be no semantical obstacle to the practice here.
- You cannot expect LCSH entries to be found in VIAF i.e. they have to be researched separately at the Library of Congress.
SACO publishes lists of recently approved headings, within these lists the terms are provided with LCSH identifiers prefixed with sp (p probably standing for provisional...) but they will enter the LCSH file with sh prefixes (cf. MARC21 documentation). Although evidence shows that the prefix sp is mechanically changed into sh keeping the "number part" without modification this is not much help: There is a gap of several weeks between publishing of the approval and the actual authority record showing up in the public database. Example from the February 2015 list of approved terms (published 2015-02-16):
110 Castillo de Turégano (Turégano, Spain) [sp2014100062]
can now be found with identifier sh2014100062 on id.loc.gov (dated 2015-03-04). Thus from a practical point of view the sp identifiers cannot be used here since the heading either already exists in the published database and has prefix sh or it does not (yet) show up at all. -- Gymel (talk) 09:21, 23 March 2015 (UTC)
- Done I've changed the label to "LCAuth" = Library of Congress Authorities (Q13219454). See also: Usage note: LC properties. --Kolja21 (talk) 04:55, 20 May 2015 (UTC)
MARC prefix Explanation of usage n Name or subject authority record keyed by LC (see Library of Congress Name Authority File (Q18912790)) nb Name or subject authority record originating in the British Library (see British Library (Q23308)) nr Name or subject authority record originating in the Research Libraries Information Network (RLIN) (see Research Libraries Group (Q7315111)) no Name or subject authority record originating in the Online Computer Library Center (OCLC) (see OCLC, Inc. (Q190593)) sh LCSH subject authority record distributed by LC (see Library of Congress Subject Headings (Q1823134)) sj Juvenile subject authority keyed by LC and distributed in the LC Annotated Children's Cataloging Program sp Subject authority proposal record in the LC catalog. (When it is approved, the record will be distributed with the prefix sh.) The p in sp stands for proposal: http://www.loc.gov/marc/lccn_structure.html
I notice our current filter allows sn. I see that listed nowhere. Is that useful? 50.126.125.240 13:31, 25 May 2015 (UTC)
Thanks for the list. "sn" is part of the so called "pseudo LC control numbers". Used in records authenticated by NSDP and CONSER members. (Prior to 1984, "sn" control numbers were also assigned to LC minimal level cataloging records.) See: CONSER editing guide, 1994 ed. --Kolja21 (talk) 15:23, 25 May 2015 (UTC)
- Yes, and sn seems to clearly be a part of Library of Congress Control Number (LCCN) (bibliographic) (P1144) but not a part of LCAuth: http://lccn.loc.gov/sn82005086 Remember LCAuth is a subset of LCCN (but not the other way around). 50.126.125.240 15:31, 25 May 2015 (UTC)
- I've updated Constraint:Format. Hope it works ;) --Kolja21 (talk) 23:50, 25 May 2015 (UTC)
- Thanks—I am now wondering if we should also update it to remove the empty prefix possibility (again LCCN supports that but I know of no LCAuth identifiers that do). 50.126.125.240 02:30, 26 May 2015 (UTC)
Genre/Form
editAdded "gf" (Genre/Form) to the allowed prefixes. Eg http://id.loc.gov/authorities/genreForms/gf2011026174 = http://id.loc.gov/authorities/gf2011026174 (courtroom art).
This thesaurus has 1659 terms: http://id.loc.gov/search/?q=memberOf:http://id.loc.gov/authorities/genreForms/collection_LCGFT_General&start=1658 --Vladimir Alexiev (talk) 09:15, 29 May 2015 (UTC)
- That is interesting as it is definitely an LCCN authority and it also automatically redirects like the others:
- I added this to the constraint filter. 50.126.125.240 12:39, 29 May 2015 (UTC)
I'm going to revert that: There is also sh85033569, i.e. one term is for the characterization of "courtroom art" (in a broad sense), the other for the descriptions of material about "courtroom art" (usually in a narrower sense). Your gf-Example comes from the lcfgt list (LoC Form Genre Thesaurus I presume) a controlled vocabulary for the use in MARC21 contexts. I consider this a case where scattered term lists and thesaury are made fit for the Linked Data world, i.e. now "they" also can be accessed via an "actionable" URL. But they remain thesaurus entries in their thesaurus, they are not exchangeable with the subject heading of the same wording. Or to put it the other way round: Not every identifier you can stuff into http://id.loc.gov/authorities should automatically qualify for property P244: IMHO it was coined for names and matter of factly also applies to subject headings but should not be used to "track" any terminological project within the Library of Congress or at least only those we have a gist of understanding of how it might relate to wikidata items. -- Gymel (talk) 13:25, 29 May 2015 (UTC)
- I am not sure I really understand your argument (but I am am prepared so concede you probably understand this better than I). That said, perhaps you can explain it better (I have reread your comments several times and I am still missing something)? Since you brought up the topic. I am not sure I understand the real value in having multiple properties for different LCCNs anyway (LoC treats them differently and it only one extra click away from linking to the same place for authority records anyway). Perhaps you can speak to that as well. Thank you. 50.126.125.240 00:37, 30 May 2015 (UTC)
- The point is, what does this property P244 stand for. Assume we had an item Qx for "courtroom art", what would it mean to have both sh85033569 and gf2011026174 as P244-values for that item? If you search for "courtroom art" at the LC you get the following results:
- http://id.loc.gov/vocabulary/graphicMaterials/tgm002655 from the Thesaurus of graphical materials
- http://id.loc.gov/authorities/genreForms/gf2011026174 from the Genre/Form Terms (our http://id.loc.gov/authorities/gf2011026174 actually is a redirect to that, I didn't notice that before)
- http://id.loc.gov/authorities/subjects/sh85033569 from the Subject Headings
- and http://id.loc.gov/authorities/subjects/sh2010025099 from the Subject Headings, too, and marked in red and accompanied with a deletion/deprecation note: gf2011026174 (from above) should be used instead.
- For me it's obvious that according to the Libary of Congress the term courtroom art can be understood or used with two different "semantics". And I now assert that LC "authorities" in sufficient correspondence to wikidata items are always of type "subjects" (topical terms) and never of type "genreForms" (a broad classification for certain aspects of works). -- Gymel (talk) 12:55, 30 May 2015 (UTC)
- @Gymel: 1. Have you examined the 1659 gf terms to assert that "wikidata items are never of type genreForms"? Are you certain all genreForms are also present as Subjects?
- 2. Maybe be consistent and propose the banning of sj (Juvenile subjects) too?
- 3. "Two different semantics": Yes, if professional librarians from LoC think there should be several thesauri (subjects, graphicMaterials, genreForms), maybe they have their good reasons for that. Above you say one of these thesauri doesn't apply to WD (based on single example) and one of these semantics has no legitimate entry in WD, but what's your proof? Maybe in some language, gf2011026174 and sh2010025099 are expressed with 2 different terms, at which point two WD items can appear.
- 4. We should also examine tgm (graphicMaterials). --Vladimir Alexiev (talk) 14:26, 1 June 2015 (UTC)
- I checked several gf (Block diagrams, Reality television, Rai (music), Police films, Dhuns, Children's radio programs) and there always is a corresponding subject.
- It looks like gf is a conceptual subset of Subjects, but in a separate ConceptScheme and with no link between the two. The hierarchies don't always match. Eg Dhuns has parent subjects "Hindustani music" and "Music--India" while the gf parent is only one "Hindustani music" (presumably because "Music--India" is a precoordinated term while "Hindustani music" at least graphically is an atomic term.
- For creating new data that links to WD I agree with Gymel: let's stick with the super-set (subjects) and ignore the sub-set (gf). But for integrating old data that refers to gf, the gf identifiers would be very useful --Vladimir Alexiev (talk) 14:42, 1 June 2015 (UTC)
- The point is, what does this property P244 stand for. Assume we had an item Qx for "courtroom art", what would it mean to have both sh85033569 and gf2011026174 as P244-values for that item? If you search for "courtroom art" at the LC you get the following results:
- About the two semantics: you can see it on a precoordinated (compound) term like http://id.loc.gov/authorities/subjects/sh2010112128.html "School sports--Fiction": "School sports" is the Topic component, and "Fiction" is the genreForm component. --Vladimir Alexiev (talk) 15:13, 1 June 2015 (UTC)
- There is a document Introduction to Genre/Form Terms which on p. 2 has a section "Consistency between genre/form terms and subject headings", highlighting some odd differences in wording or non-correspondance, but stating The chief difference between genre/form terms and subject headings in the LCSH system lies in their application. Unlike most subject headings, genre/form terms are intended to be used as facets without further subdivision..
- LCSH is a system of "headings", so precombined terms will be assigned an identifier of their own, for most of them there won't ever be an Wikidata item with comparable semantic content. The LCGFT are a terms list, so structurally closer to Wikidata. But since they cannot be extended as freely as LCSH concepts I presume they are quite reluctant to over-differentiation, i.e. even if there exists a natural language term for some refinement of an already existing concept they'll probably try to live without that. Contrary to LCSH "Detective and mystery fiction" is neither subdivided into "Detective fiction" and "Mystery fiction" nor into "Detective and mystery stories" and "Detective and mystery plays".
- Since the entities represented by LCSH and LCNAF are completely disjoint, I'm alright with the silent extension of P244 to cover links to both systems, especially since there exist a resolver at the LoC site which takes care of the requests. But LCSH and LCGFT are conceptually way too similar and at the same time very different in usage (asking a library system for LCFGT "courtroom art" and for LCSH "courtroom art" should almost by definition yield quite disjoint results) so within Wikidata we had to make absolutely clear which of the both identifiers in P244 is which (and all editors would have to learn the distinction). Thus should we ever have the desire to note LCGFT terms, request of a dedicated property seems the right way to do. -- Gymel (talk) 16:34, 1 June 2015 (UTC)
- Ok, I agree to omit this one --Vladimir Alexiev (talk) 16:54, 1 June 2015 (UTC)
Thesaurus of Graphic Materials
editSimilar questions as prev section can be considered for tgm (thesaurus of graphicMaterials). http://id.loc.gov/vocabulary/graphicMaterials.html says 7000 concepts, which can be accessed at http://id.loc.gov/search/?q=&q=cs:http://id.loc.gov/vocabulary/graphicMaterials. Any volunteers to check whether tgm is a subset of Subjects --Vladimir Alexiev (talk) 15:01, 1 June 2015 (UTC)
Children's Subjects
editIs sj Childrens Subjects also a subset of Subjects, and therefore should be omitted?
Eg http://id.loc.gov/authorities/childrensSubjects/sj96006051 = http://id.loc.gov/authorities/subjects/sh2001004401 (Religious leaders is a children's subject? YACK)
BTW you can use this syntax for precise searching aLabel:"religious leaders", see http://id.loc.gov/techcenter/searching.html. But mind you, it skips http://id.loc.gov/vocabulary/graphicMaterials/tgm012243
– The preceding unsigned comment was added by Vladimir Alexiev (talk • contribs) at 16:54, 1 June 2015 (UTC).
- When you visit http://id.loc.gov/ you see all available datasets. Clicking on them yields short descriptions like The Library of Congress Subject Headings Supplemental Vocabularies: Children’s Headings (LCSHAC) is a thesaurus which is used in conjunction with LCSH. It is not a self-contained vocabulary, but is instead designed to complement LCSH and provide tailored subject access to children and young adults when LCSH does not provide suitable terminology, form, or scope for children. LCSHAC records can be identified by the LCCN prefix "sj".
- I cannot understand the difference between sj96006051 (used for religious biography) and sh2001004401 (used for spiritual leaders), it might be minor in the sense that the LCSHC also applies to "Womoen religious leaders", "Indian religious leaders" (which indians BTW?), "Sexual minority reigious leaders" and whatever narrower concepts proper LCSH provide. Or they could have not much in common, because the children's heading is more concerned with biography of "religious guidance"? There are probably huge manuals with scope notes one had to study before deciding how strongly identically named concepts of the different schemes are related. Note also "religious leaders" as TGM entry http://id.loc.gov/vocabulary/graphicMaterials/tgm012243 (deprecated, use "Spiritual leaders" http://id.loc.gov/vocabulary/graphicMaterials/tgm009961 but I'm getting the impression that TGM also covers aspects of iconographic descripton) and http://id.loc.gov/vocabulary/ethnographicTerms/afset015199 (that's straightforward: LC has huge archival and objects holdings with respect to American history/heritage). -- Gymel (talk) 17:26, 1 June 2015 (UTC)
- I removed "sj" from the constraints. 50.53.1.33 02:04, 31 December 2016 (UTC)
Vandalism
editGenre/Form (revisited)
edit@Vladimir Alexiev, Gymel: I think we should re-consider the rejection of the Genre/Form catalogue above.
The rejection appears to have been based on the notion that Library of Congress authority ID (P244) values should stand in 1:1 relation to Wikidata items. But this is simply not the case. Consider for example aerial photography (Q191839) / balloon aerial photograph (Q21082583). If we search for "aerial photographs" at LoC identities, there are three different hits for this in the Subject Headings catalogue, before we even consider the entry in the GF catalogue:
- http://id.loc.gov/authorities/subjects/sh99001928.html (Aerial photographs) -- LoC subject heading
- http://id.loc.gov/authorities/subjects/sh99001232.html (Aerial photographs) -- LoC form subdivision
- http://id.loc.gov/authorities/subjects/sh85001253.html (Aerial photographs) -- LoC topic subdivision
- http://id.loc.gov/authorities/genreForms/gf2011026032.html (Aerial photographs) -- Genre forms thesaurus
Searching for "aerial photography" brings in the further:
- http://id.loc.gov/authorities/subjects/sh85101264.html (Aerial photography) -- LoC subject heading
plus hits for the application of aerial photography to different fields.
I suspect that it is far from unusual for a subject entry to have multiple hits in this way: as a main subject heading, as a topic subdivision and as a form subdivision. So in the long run, I think the "uniqueness" constraint will need to be abandoned, perhaps to be replaced with a complex constraint that if multiple values are present they need to be qualified by different values of subject has role (P2868).
You might say we should just choose one of these links and arbitrarily exclude the rest. But that would be a big mistake, I think, because it then makes reverse lookups impossible -- it means that if people start with an LoC term, a query may not then be able to find the Wikidata item for them, nor relevant matches in other vocabularies and systems from Wikidata. That's something I think is not acceptable.
If we abandon the uniqueness requirement, there would seem to be no objection to including the Genre forms thesaurus. In my view, it would be very worthwhile to match. It's a small enough dataset that it makes for a nicely achievable project. It's also IMO a priority area for describing items, where it would be very very valuable to make sure we have solid coverage in a solid ontology. And the whole dataset is directly available for download [3], including immensely useful Broader terms / Narrower terms / Related terms / Variant name information, to compare to our current class trees. It's actually something I'd like to work on at the Wikidata workshop this weekend. This seems to be the obvious property to use for them, since the URL resolver is exactly the same. I would distinguish GF hits with the qualifier subject has role (P2868) = "genre or form" (item to be created).
Ideally, as I say, I'd propose to start work on this this weekend. A longer-term solution that occurs to me might be to split Library of Congress authority ID (P244) into two -- one property for names, which probably would be unique; another for subjects, which might well not be. As a by-product, it might also help query performance, because the number of uses of P244 is now very large. Splitting P244 would then mean that the query engine would be able to work directly with the subset of uses for subjects, or for people, without have load all the others into memory and consider them too.
So: if I don't hear any objections, I hope to start adding GF values this weekend, and maybe also start a Mix'n'Match catalogue for anything I can't match directly. Is that okay with everybody? Jheald (talk) 13:18, 29 January 2018 (UTC)
- @Jheald: Thanks for the excellent examples! I don't remember the earlier discussion, but certainly we should bring all LoC concepts in. I'm not sure WD cares about the subject/form/topic distinction: when you apply a WD concept, you make the same distinction by picking an appropriate property, but that has no link to the particular LoC term.
- As for whether to split Library of Congress authority ID (P244) into Named Entities vs concepts: maybe it's not a bad idea, since almost all other thesauri in the world have such split (eg Getty AAT vs TGN & ULAN). I've always wished MnM has such distinction to avoid false match suggestions to AAT... But WD itself doesn't have such distinction. And GND keeps everything together...
- In any case, I think it's a big maintenance task to apply such split, so unless you can do it, I'd say leave it as is. --Vladimir Alexiev (talk) 12:03, 3 March 2018 (UTC)
- @Vladimir Alexiev: I thought about it again, and have now started new property proposals for Library of Congress Genre/Form Terms ID and Library of Congress Demographic Group Terms ID. I think maybe it does make sense to separate things out, when we easily can.
- As regards separating the LoC Subject Headings (LCSH) from the LoC Name Authority File (LCNAF), this would be very easy to do if we thought it was a good idea: simply propose a new property for LCSH, then get a bot to cut & paste all P244 statements with values starting "sh....." to the new property.
- The advantage would be the ability to have tighter constraints, queries that might be a little faster, and also if there was anyone who was only interested in one catalogue or the other, they wouldn't have to use string operators any more to distinguish between the two. The downside would be the disruption to anyone who's got code using P244 that would need to be updated to the new property. Jheald (talk) 11:47, 6 March 2018 (UTC)
- @Jheald: What's the point of splitting off Genre/Form and Demographic, if we leave Subjects mixed up with Named Entities? The former 2 are tiny, the latter 2 are huge. I just don't see the logic of splitting off the 2 tiny parts but keeping the 2 huge parts mixed.
- As for "easy with a robot": there are more concepts than the subjects with prefix "sh". Just like LCNAF has a bunch of prefixes not just "n". But I'm sure it's doable, and if you promise to write a robot to do the separation, I'll vote for your proposals --Vladimir Alexiev (talk) 15:46, 11 March 2018 (UTC)
- @Vladimir Alexiev: Interestingly there are currently only 5515 values that start 'sh'
tinyurl.com/ybzhl25d
, and only 5525 that don't start 'n'tinyurl.com/yddw2fsg
, so this would be very very manageable to achieve -- probably less than an hour to execute with QuickStatements, without even the need to ask for bot approval. - I'll propose a specific property for LCSH once the current two proposals go through -- let's see whether people support those first.
- Even though Genre/Form and Demographic, I do think it's quite useful to give them their own properties. It's well worth being able to avoid string operations if one can, if one wants to focus on a specific catalogue. And while out number of values for LCSH at the moment is quite small, it has the potential to be huge, so I think well worth giving it a property of its own. Jheald (talk) 22:23, 11 March 2018 (UTC)
- @Vladimir Alexiev: Interestingly there are currently only 5515 values that start 'sh'
Possible vandalism
editWould someone with a better understanding of this property mind double-checking these 5 edits and my reverts? I don't know if I've done the right thing in each case, since some of the linked IDs no longer exist or never existed. Thanks, Jc86035 (talk) 11:28, 8 April 2019 (UTC)
Families
editPlease see Wikidata_talk:WikiProject_Names#LC_on_family_names and comment there. --- Jura 16:54, 4 July 2019 (UTC)
Format problem with numbers beginning with no2020
editSomething needs to be fixed in the Regex for the format of the Library of Congress authority ID. For numbers beginning no2020, it is throwing up a format constraint error, see item Q1883105 for an example. These numbers are for authority IDs produced in OCLC. Numbers produced directly at the Library of Congress would start n2020 but i don't have an actual example to check to see if that also produces an error message.
- Done by allowing 2000-2029 (I don’t see the point in limiting this to 2000-2019 + 2020, but if anyone thinks that’s better, be my guest and don’t forget to update the regex next year :P ) --Lucas Werkmeister (talk) 17:59, 7 January 2020 (UTC)
American novelists
editHello eveyone. With respect to the constraint which imposes the item not to be an instance of (P31)Wikimedia list article (Q13406463), what about list of American novelists (Q693651)? The only solution that came to my mind is to create an item "American novelist", but I don't think it would make much sense. --Horcrux (talk) 10:28, 17 June 2021 (UTC)
LC Medium of Performance Thesaurus for Music (LCMPT)
editAdded mp to the property URL match pattern (P8966) so that we can use the identifiers in LCMPT Library of Congress Medium of Performance Thesaurus for Music (Q97739256).
New value: ^https?:\/\/id\.loc\.gov\/authorities\/(?:(?:name|subject)s\/)?((?:n|nb|nr|no|ns|mp|sh)(?:[4-9][0-9]|00|20[0-2][0-9])[0-9]{6})
LCNAF as a reference
editApplicable "stated in" value (above) should be updated to Library of Congress Name Authority File, maybe.
It's no longer possible to cite LCNAF alone as a reference, maybe. I don't find the accepted usage explained here, but it may be here.
From Shannon Ravenel at LCAuth https://lccn.loc.gov/n82091368 :
At Q95991116 last hour I added Library of Congress authority ID and WorldCat Identities ID as usual. LCAuth provides yet unknown DoB
- "found: New stories from the South, 1986:CIP t.p. (Shannon Ravenel) CIP data sheet (b. 8/13/38)",
so I added date of birth 1938-08-13, preferred rank.
As a reference to add, I tried stated in LCNAF; retrieved 2022-05-07 (as usual pre-CoViD). Because option to "publish" was dead grey, I guessed a few ways (as? id? ...) to add the LC personal identifier n82091368. Nope. [Change to present tense, live.] Removing reference data, option to "publish" remains dead. Maybe some refresh problem? Start over: date of birth 1938-08-13, preferred rank. Now I have the live option to "publish" and do so.
Momentarily I will add from Shannon Ravenel at VIAF some more national library ID. Date of birth will remain flagged as needing a reference. (By the time you read this, English Wikipedia --one source cited here for 1938-- might have the full birthdate.) --P64 (talk) 19:38, 7 May 2022 (UTC)
- Library of Congress authority ID: n82091368, retrieved: 7 May 2022. I don't see the problem. --Kolja21 (talk) 21:27, 7 May 2022 (UTC)
- The problem was I didn't guess the property name to follow "stated in" and precede "retrieved" ---which two properties alone were adequate a couple years ago. I expected another generic property name {"as", "id", "label", "there ..."}.
- Experimenting now, I see that "LCau" expands to "Library of Congress Name Authority File" and "LC au" (spaced) expands to "Library of Congress Authorities" --as values for 'stated in'. Only the former expands to a subsequent property name: voila! that property is "Library of Congress Authorities ID".
- Trying another source, I see that "SFE" expands to "The Enyclopedia of Science Fiction" as 'stated in' value, and "SFE" expands to "The Encyclopedia of Science Fiction ID" as subsequent property name! (I'll remember that.)
- Similarly "GND" expands to both an appropriate 'stated in' value and an appropriate next property name. (I'll remember that, too.)
- Suppose my source for birthdate is BNF --as i still call someone's page at the French library. Below in section Identifiers, I have linked that source by adding BNF (expands to "Bibliothèque nationale de France ID", as property name in that new statement). Continuing my experiment, I see that "BNF" does not expand to a useful 'stated in' value; "BNFau" is no match; "BNF au" expands to "BnF authorities", with description that begins, "authority file for persons ...". OK, I know that's what I'm looking for. "BNF" alone expands to the property name for my next value 15592697x.
- NUKAT? I suppose I need something more as a 'stated in' value. "NUKATau" and "NUKAT au" do not expand ... --P64 (talk) 22:17, 29 May 2022 (UTC)
Add qualifiers P170 and P571 as floruit note
editI would like to add P170 (creator) and P571 (inception) as qualifiers for Library of Congress authority ID. There are many, many items where Library of Congress authority ID (P244) is one of the only a handful of properties and surfacing even a tiny bit of extra information on who created the MARC record and when (i.e. adding a floruit note) is going to make a difference in disambiguating some of those. These are present in all records and once we're tested them out can be automatically added from the MARC (040$a and 008). Stuartyeates (talk) 02:01, 31 July 2024 (UTC)
- Interesting. Can you give an example? Kolja21 (talk) 15:41, 31 July 2024 (UTC)
- Hi Kolja21. Consider Martin Danner (Q94800459). A floruit hint of 2003 seems like a non-trivial addition to the information in the item. Looking again at a range of MARC records, including this, I'm liking source (Q3142800) for encoding 670a and 670b fields too. Stuartyeates (talk) 10:51, 1 August 2024 (UTC)
- LCAuth n2003087915 was added by User:Reinheitsgebot and has nothing to do with this Martin Danner, a person from the 16th century. IdRef 059676159, added by User:BboberBot, is a third person. Kolja21 (talk) 12:49, 1 August 2024 (UTC)
- PS: @Epìdosis: Why GND 13619074X is stated as the source for ISNI? There is no link to GND in this ISNI. Kolja21 (talk) 13:06, 1 August 2024 (UTC)
- There is indeed, but it is visible only in the RDF export, see https://d-nb.info/gnd/13619074X/about/lds. It is the same trick for the birth and death dates: the normal record always shows only years, but often the days are available looking into the RDF. Epìdosis 21:02, 1 August 2024 (UTC)
- For the Martin Danner described in https://id.loc.gov/authorities/names/n2003087915.html I have now created Martin Danner (Q128263181): software developer and entrepreneur. Epìdosis 21:09, 1 August 2024 (UTC)
- Upps, the LCAuth person lived in the 19th century (co-owner of hardware/grocery store). He is not a "software developer and entrepreneur". This a fourth person, see Talk:Q128263181. --Kolja21 (talk) 22:24, 1 August 2024 (UTC)
- @Epìdosis: Thanks and one wish. Can you delete one of two sources given if both are identical? Kolja21 (talk) 21:20, 1 August 2024 (UTC)
- For the Martin Danner described in https://id.loc.gov/authorities/names/n2003087915.html I have now created Martin Danner (Q128263181): software developer and entrepreneur. Epìdosis 21:09, 1 August 2024 (UTC)
- There is indeed, but it is visible only in the RDF export, see https://d-nb.info/gnd/13619074X/about/lds. It is the same trick for the birth and death dates: the normal record always shows only years, but often the days are available looking into the RDF. Epìdosis 21:02, 1 August 2024 (UTC)
- PS: @Epìdosis: Why GND 13619074X is stated as the source for ISNI? There is no link to GND in this ISNI. Kolja21 (talk) 13:06, 1 August 2024 (UTC)
- LCAuth n2003087915 was added by User:Reinheitsgebot and has nothing to do with this Martin Danner, a person from the 16th century. IdRef 059676159, added by User:BboberBot, is a third person. Kolja21 (talk) 12:49, 1 August 2024 (UTC)
- Hi Kolja21. Consider Martin Danner (Q94800459). A floruit hint of 2003 seems like a non-trivial addition to the information in the item. Looking again at a range of MARC records, including this, I'm liking source (Q3142800) for encoding 670a and 670b fields too. Stuartyeates (talk) 10:51, 1 August 2024 (UTC)
- Back to the proposal by User:Stuartyeates. What additional information would the qualifiers provide?
- Library of Congress authority ID (P244): n2003087915
- creator (P170), MARC 040$a: DLC = Library of Congress?
- inception (P571), MARC 008: 030814n| acannaabn |n aaa = 3rd August or 8th of March [20]14?
- Library of Congress authority ID (P244): n2003087915
- The questions concerning the example Martin Danner:
- Can additional qualifiers help finding mixed items?
- How can we find items that are mixed due to the import of unchecked IDs based on a VIAF cluster?
- How can we find items that have been created based on a VIAF cluster (example: Talk:Q126520689)
- --Kolja21 (talk) 23:14, 1 August 2024 (UTC)
- I have realised that I may have gone off half-cocked, without any real evidence on this. Does anyone have a download link for the MARC records that VIAF matches against? I've found the download link for the mark clusters (https://viaf.org/viaf/data/ ~ 12 GB) and am donwloading them with the intention of running an analysis of which tags / subtags are frequently used. Stuartyeates (talk) 22:11, 2 August 2024 (UTC)
URL match pattern
editI added an optional non-capturing group for the ".html" prefix so that DifoolBot would catch those URLs too. Noticed the issue in this bot edit. FYI: Difool. Samoasambia ✎ 16:53, 6 November 2024 (UTC)