Wikidata:2020 report on Property constraints
Unless a software bug is introduced, the Wikibase data model, that of Wikidata, is inviolable: it is not possible to add a paragraph with a quantity-type Property, a statement cannot belong to two different Items, an Item cannot have several descriptions in the same dialect, etc. These restrictions are fixed, they apply equally to all data and give Wikidata basic representational consistency while not preventing it from representing infinite possible entities in infinite different ways. Since the rules mentioned do not vary according to the stored information, but the presence of certain information does make other information necessary or impossible, another system of rules is used: Property constraints. These rules are configured, or not, for each Property depending on its particular meaning, they are improved over time, their configurations are part of the Property data and their compliance is checked but not currently enforced by the software in any case.
How to read this report
[edit]This report does not present decisions, only data and evidence-based conclusions and suggestions about Wikidata Property constraints to assist those who contribute to Wikidata in their regular decision-making processes.
This report cannot and is not intended to analyze all the features and efforts related to Property constraints. Some relevant aspects outside the scope of this report include, but are not limited to, constraint violations, user interfaces, the Lexeme namespace, the code of WikibaseQualityConstraints and the current workflow of the Wikidata development team.
As of December 2019, the percentages regarding the reported behavior and opinions of active Wikidata editors can be extended to all active Wikidata editors with a confidence level of 95% and a margin of error of ±10%. These figures are based on a survey sent to the active Wikidata editors who had the email reception enabled in Wikidata.
The figures regarding the definition and configuration of Property constraints on Wikidata have no statistical margin of error because they have not been sampled, all Wikidata Properties and their constraints have been analyzed.
Glossary
[edit]For the purposes of this report:
- An active Wikidata editor is one who has made 5 edits or more to Wikidata with a non-automated user account in the 30 days prior to the data capture (December 2019).
- A Property constraint is a Wikibase statement whose:
- subject is a Wikibase Property (the constrained Property);
- property is property constraint (P2302) on Wikidata, or the analogous Property defined in WikibaseQualityConstraints on other Wikibase installations;
- value is a Wikibase Item representing a constraint type.
- An exception to a constraint is an exception to constraint (P2303) qualifier on that constraint.
For other terms and concepts, see Wikidata:Glossary.
Goals
[edit]Throughout this report, goals with progress bars are presented. These goals represent ideal states of Property constraints; while not all are necessarily achievable, the progress bars visualize how far Wikidata currently is from those ideal states.
Goal: There are no Wikidata Properties without constraints. | |||
|
98% completed |
These goals do not reflect the only aspects that should be taken into account when estimating the current quality or maturity of Property constraints.
Knowledge, perception and ease of use
[edit]Best practices in business suggest that ensuring the quality of large volumes of data requires investments in strategic plans, carefully designed processes, software solutions and highly qualified staff. With this framework, achieving the optimal configuration of data quality systems for a project of the size and growth rate of Wikidata solely by volunteers, many of them anonymous, from all cultures and professional backgrounds, using only free software and open standards, seems an unprecedented challenge.
Wikipedia, the most comprehensive reference work in history, has well proven that these conditions do not limit, but rather encourage, the writing of a sufficiently useful free encyclopedia. However, when the goal is not to write an encyclopedia but to build an interoperable knowledge base, some of the What-You-See-Is-What-You-Get features disappear, mastery of natural language is no longer a key skill, and it is more difficult to contribute with correct changes. With these drawbacks, the software needs to code more rules and processes to make quality management tasks easier and reduce the cognitive load of volunteers, most of whom either have no specific training in data management or are unfamiliar with the particular conventions and use cases of Wikidata.
Although the direct analysis of user interfaces is outside the scope of this study, the ease of use has been analyzed in an indirect way using the perceptions of the editors surveyed. When asked whether the warnings about constraint violations are clear enough, most active editors (51%) said that, "in general, they are clear enough" (😀); some of them (28%) said that "sometimes they are clear enough, but sometimes are not" (😐); and a smaller proportion (18%) said that "they are not as clear as they should be" (😠).
The experience and commitment of the editors have some influence on the clarity that they find in constraint violation messages. While 66% of administrators said that, in general, these warnings are clear enough (😀), 58% of Property creators and only 40% of the rest of active editors provided this response.
When asked how often the data needs to be corrected when a constraint violation appears on Wikidata, most active editors (63%) said "often" (😀), some of them (27%) said "about half the time" (😐) and a small proportion (9%) said "occasionally" (😠).
When asked how easy or hard to use they thought Wikidata's system of Property constraints is in general, 36% said it is "neither hard nor easy" to use (😐), 31% said it is "relatively hard" to use (😠), and 27% said it is "relatively easy" to use (😀). Although the goal of making the system easy to use for all active editors would be unrealistic, even unnecessary if some editors are not interested in data quality issues, some improvements may still be necessary.
User interfaces are the communication tool with the most potential, more effective at announcing new features to editors or instilling good habits in them than mailing lists, messaging groups, documentation or help pages, social profiles, talk pages or the project chat, which are sometimes not read. It is suggested to make user interfaces as self-explanatory as possible and, when additional information on Property constraints is actually required, to design one or more automatically generated documentation pages similar to Special:ListDatatypes to reduce the efforts needed to keep this information consistent, complete and updated.
Understanding of constraint types
[edit]Active editors were given a list of names of constraint types and were asked whether they knew and remembered what each of the listed constraint types was for, with three possible answers: "not really"; "yes, more or less"; or "yes, totally". Their answers reflect their knowledge of the constraint types and how descriptive the name of each constraint type is.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Constrained Properties
[edit]Goal: There are no Wikidata Properties without constraints. | |||
|
98% completed |
The number of Wikidata Properties without constraints is low enough. 98% of Wikidata Properties have at least one constraint, and 61% have at least four. However, some constraints should be better defined to be effective, and some widely applicable constraint types are underused (e.g., the allowed entity types and Property scope constraint types) or poorly known.
Most unconstrained Properties (2%) represent cases in which it was forgotten to define constraints or ignored that it was necessary to do so. This could be related to the fact that the web user interface does not require or suggest the addition of statements, and constraints in particular, when creating a Property, nor shows the constraints section when there are no statements with property constraint (P2302).
Since many constraint types can be applied to some Property types, and very few to others, the differences in the number of constraints between Property types are significant. This is not necessarily a negative fact, since both the amount of information that each Property type is able to represent and the impact of each constraint type are variable.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Most constraint types do not apply to most Property types. However, when editing the statements, all the constraint types are suggested by the web user interface for all the Property types. As there are a considerable number of different constraint types, this opens the door to confusion between the names of different constraint types and makes some of the applicable constraint types less visible. If a user defines a constraint that cannot be applied to the constrained Property because of an incompatible Property type, the software system allows the constraint to be saved and the problem is not reported.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Severity levels
[edit]Wikidata as a community, and the WikibaseQualityConstraints extension as an implementation, classify all constraints and their possible violations into three mutually exclusive categories. These categories are sometimes called statuses, the English label of constraint status (P2316) and original name in the system of template-based Property constraints; and sometimes called levels, a less ambiguous name that is preferred over statuses by 73% of active editors. It is recommended to apply the necessary changes to use a single name for this categorization.
Severity levels are a highly unknown feature among editors. No reliable percentages can be provided in this regard, but several active editors who claim to have edited Property constraints say they do not know what "severity levels" or "statuses" of Property constraints are, while others show that they are unaware of the existence of such a categorization.
The current severity levels of Property constraints are, ordered from highest to lowest severity, the mandatory constraint level, the normal constraint level and the suggestion level.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The names of the three severity levels have consistency problems, even the levels are not clearly defined. The lexical category that distinguishes the mandatory constraint level is an adjective ("mandatory"), the one that distinguishes the suggestion level is a noun ("suggestion"), and the normal level has no defined name. Sometimes the word "suggestion" is understood as contradicting the name of the system in which it is framed, "Property constraints", while some users consider the word "mandatory" redundant. Some users consider that talking about "mandatory constraints" is misleading because these constraints are also violated. When active editors were asked whether they would change the current names of the severity levels to any of a list of suggestions, 67% said "maybe" they would change the names, 24% said they would definitely change the names, and 9% said they would not change the names.
mandatory
[edit]The mandatory constraint level was introduced by the original system of template-based Property constraints and continues today, although there is no agreement on when to apply it or what it currently means. A constraint belongs to this level when it has the constraint status (P2316) qualifier with value mandatory constraint (Q21502408). This level represents 29.2% of constraints on Wikidata.
In 2014, the user in charge of developing and running the bot that updated the violation reports created a specific report of mandatory constraint violations, still in use, to "prevent data structure degradation," and requested to mark as mandatory those constraints without violations, without exceptions, without known conflicts, and whose constrained Properties had "existed for some time." According to the described criteria, this severity level was not determined by the meaning, support or relevance of the constraint, but it was understood as a rank to which, ideally, all constraints should aspire when they reached full compliance. From this perspective, the mandatory constraint level would prevent data degradation by preserving the past achievements in terms of data quality.
Today these conditions for specifying the mandatory constraint level are not met, they have not been openly discussed and, due to the low visibility of the page where they were written, it is often not known that they were written. In particular, there are 0.3% mandatory constraints with exceptions, and one exception to a mandatory constraint every 125 mandatory constraints. When active editors were asked if they agreed that "mandatory constraints should never have exceptions," 40% of them said they did not agree, 36% said they were not sure, and only 24% said they agreed. Administrators and Property creators seem less opposed than average to the idea that mandatory constraints should never have exceptions, with an agreement-disagreement ratio of about 50-50.
normal
[edit]This is an implicit level without an agreed name and represents an undefined, default, normal, common or less-than-mandatory constraint level. A constraint belongs to this level when it has no constraint status (P2316) qualifier. This level represents most (69.4%) constraints on Wikidata.
When active editors were asked to keep the current names of the other severity levels and give a name to this one from a list of proposals:
- 36% of them chose "warning level" (with "warning" being a noun, as in the suggestion level),
- 21% of them chose "standard constraint level" (with "standard" being an adjective, as in the mandatory constraint level),
- 12% of them chose "non-mandatory constraint level", and
- 10% of them chose "normal constraint level".
Other active editors chose other options ("undefined constraint level", "default constraint level"), proposed other names ("hint level", "recommendation level"), did not feel able to make a choice or expressed the opinion that this level should not be considered as such or should not exist.
suggestion
[edit]The suggestion level, introduced in 2019, is intended to make editors aware of the possibility of making a change. In other words, the suggestion level specifies possibility, not necessity or convenience. A constraint belongs to this level when it has the constraint status (P2316) qualifier with value suggestion constraint (Q62026391). This level only represents 1.4% of constraints on Wikidata.
As announced by the Project Manager Community Communication for Wikidata, this severity level was created "[i]n order to allow more flexibility and subtlety in constraints definition" so that "editors can distinguish the really crucial constraint violations from the ones that only suggest additional edits that would be nice to make."
Exceptions
[edit]Goal: No constraint on Wikidata has exceptions. | |||
|
95% completed |
4.6% (1183) of constraints have one or more exceptions defined on Wikidata. In particular:
- 0.3% of mandatory constraints have exceptions,
- 6.5% of normal constraints have exceptions, and
- 4.9% of suggestion constraints have exceptions.
There is an average of 0.42 exceptions per constraint. Specifically, there is an average of:
- 0.008 exceptions of mandatory constraints per mandatory constraint,
- 0.6 exceptions of normal constraints per normal constraint, and
- 0.2 exceptions of suggestion constraints per suggestion constraint.
While the number of constraints with exceptions can be considered reasonably low, the high number of exceptions of these poorly defined constraints, which globally cause the existence of more than 2 exceptions every 5 constraints, reveal some bad habits among the users who edit them. In some cases, users indiscriminately add constraints to meet a need of their exclusive interest. In other cases, editors, prioritizing the short term, or without daring to substantially modify a system they feel they do not know well, add false positives as exceptions to appease violation warnings instead of identifying the constraints as wrong and rethinking them.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
There are 15 constraints with more than 100 exceptions each. Two of these constraints have so many exceptions (2,095 and 683) that WikibaseQualityConstraints cannot afford to read them, which results in an error message ("The parameters of this constraint could not be imported because they were too long"). The Property RKDartists ID (P650) has three poorly defined constraints of the same type which also add up to 683 exceptions.
|
It is suggested to propose a bot task that consists of automatically removing all constraints with an unreasonable number of exceptions (e.g., more than 70, 80, 100), and define and implement an upper hard limit (e.g., more than 100, 150, 200) above which software systems, including WikibaseQualityConstraints and the described bot, may fail. The sole application of a lower hard limit is discouraged, as it would not allow the automatic detection and removal of wrongly defined constraints, causing their accumulation with the maximum number of exceptions.
Although there are many technical tasks to be solved in Wikibase and it is not one of the objectives of this report to analyze them, it is recalled that the problem described by phab:T168379 is a relevant source of exceptions and false positives of the range, difference within range and contemporary constraint types, and some editors have requested that this fact be highlighted in this report.
In order to draw complete and clarifying conclusions on the quality and maturity of Property constraints, the number of exceptions should be studied together with the number, distribution, authorship and context of constraint violations.
Fictional entities
[edit]Only 0.13% of Wikidata Items are fictional classes or instances, but they represent 2% of exceptions, which is more than 15 times the proportion of fictional Items on Wikidata.
Listing as exceptions all the individual fictional Items to which certain constraints do not apply is not feasible. But there is another problem that does not only concern Property constraints: not all data on fictional Items are fictional, and there is no method for unambiguously distinguishing fictional data from non-fictional data on Wikidata. Most Properties can specify either fictional or non-fictional facts from any universe and, with the current data, it is not possible to algorithmically solve this ambiguity.
Until a correct representation of fictional facts is agreed and extended, some lists of exceptions and some lists of constraint violations will remain longer than desirable, some wrong data from fictional entities will not be detected as such, and some normal constraints will be prevented from being promoted to the mandatory severity level even when their performance with non-fictional facts is perfect.
Constraint types
[edit]
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The information and specific goals of each constraint type are presented below. They should be complemented with the rest of the report.
The absence of exceptions is indicated as a goal only for those constraint types with at least 50 constraints.
First constraint types
[edit]This section presents information on the current constraint types that were conceived between 2013 and 2015. These were initially represented only as templates and implemented with bots by the community in the first place.
single value
[edit]Goal: No single value constraint on Wikidata has exceptions. | |||
|
92% completed |
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "single value" is for. | |||
|
91% completed |
The single value constraint type (see also: Item single-value constraint (Q19474404) [talk]; help page [talk]; PHP class) specifies that entities should have no more than one claim with the constrained Property.
This is the third constraint type with the most constraints defined (4,307, that is, 14.9% of constraints) on Wikidata, only after the item requires statement and format constraint types.
Single value constraints are applied:
- to almost every Property (94.3%) with distinct values constraints,
- to 85.8% of Properties with format constraints, and
- to 81.2% of Properties with item requires statement constraints.
format
[edit]Goal: All Wikidata Properties of the ExternalId type have a format constraint. | |||
|
90% completed |
Goal: No format constraint on Wikidata has exceptions. | |||
|
97% completed |
Goal: No format constraint allows an infinite number of values. | |||
|
37% completed |
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "format" is for. | |||
|
82% completed |
The format constraint type (see also: Item format constraint (Q21502404) [talk]; help page [talk]; PHP class) specifies that values with the constrained Property should meet a certain pattern defined by a regular expression.
This is the second constraint type with the most constraints defined (4,562, that is, 15.8% of constraints on Wikidata), the one with the most checks (728,425,356, that is, 20.4% of constraint checks on Wikidata) and one of the most important constraint types for external identifiers.
Format constraints are applied:
- to 90.8% of Properties with Commons link constraints,
- to 88.1% of Properties with distinct values constraints, and
- to 84.4% of Properties with single value constraints.
The most common regular expressions for format constraints on Wikidata are the following. Regular expressions 1 and 3 represent the same pattern, which is used in more than 700 different Properties.
|
63% of format constraints on Wikidata accept an infinite number of distinct values and most patterns for format constraints are too generic. These facts do not mean that current format constraints are not useful, but that they have significant room for improvement. It is advisable to avoid infinite repetitions (*, + or {…,}) whenever possible and to better check and fit the actual range of acceptable values for each Property.
Besides, dots (.), non-digit metacharacters (\D), non-whitespace metacharacters (\S), non-word metacharacters (\W) and negated character classes ([^…]) are discouraged because they allow the values to contain thousands of different invalid characters without this problem being obvious to the users who write or read the patterns. This is the case for the fourth ([^\s\/]+) and tenth ([^ ]+) most used regular expressions for format constraints on Wikidata. Even when the set of acceptable characters is large and uncertain, it is advisable to try to specify all the potentially acceptable characters rather than listing those that are not.
The knowledge and intuition that active editors have are sufficient to interpret the simplest regular expressions composed of literals and digit ranges, but insufficient for all other regular expressions. The following table presents the hit rates that resulted when active editors were asked to indicate which strings matched each of the four regular expressions listed, among the most widely used in format constraints. These results may have an optimistic bias because they only consider active editors who were willing to respond. Several values close to 50% should be interpreted as a zero level of understanding, since this is the approximate percentage that would be obtained with random responses.
[1-9] | [1-9]\d* | [^\s\/]+ | [1-9]\d{0,5} | |
---|---|---|---|---|
0 | 74% | 85% | 57% | 87% |
1 | 96% | 54% | 48% | 50% |
21.5 | 96% | 80% | 52% | 83% |
24 | 80% | 70% | 43% | 61% |
048 | 85% | 87% | 54% | 89% |
48/9 | 96% | 93% | 65% | 100% |
E480 | 98% | 98% | 63% | 100% |
48000 | 83% | 63% | 43% | 52% |
9999999 | 83% | 70% | 48% | 91% |
#Á@½þ€ŋ | 100% | 100% | 59% | 98% |
Regular expressions are currently stored with Wikibase Properties of the String type. It is suggested to create a RegularExpression (or Pattern, or similar) Property type to better edit, monitor, manage, check and process patterns represented by regular expressions. Some technical issues not addressed by the current study that are related to the lack of ability to manage regular expressions are described on phab:T176312, phab:T214378, phab:T236150 and phab:T240884.
distinct values
[edit]Goal: All Wikidata Properties of the ExternalId type have a distinct values constraint. | |||
|
93% completed |
Goal: No distinct values constraint on Wikidata has exceptions. | |||
|
97% completed |
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "distinct values" is for. | |||
|
77% completed |
The distinct values constraint type (see also: Item distinct-values constraint (Q21502410) [talk]; help page [talk]; PHP class), originally unique value as a template, specifies that each value should only be defined once on Wikidata with the constrained Property. Analogous to UNIQUE SQL constraints, it is one of the oldest and most basic constraint types of database management systems.
This is the fourth constraint type with the most constraints (4,277, that is, 14.8% of constraints) defined, the second with the most exceptions (3,497) defined and the one with the highest ratio of constraint exceptions per constraint (0.8 exceptions per constraint) on Wikidata.
Applicable to almost any Property type, it is particularly important for external identifiers: 96% of distinct values constraints belong to Properties of the ExternalId type, and 93% of Properties of the ExternalId type have distinct values constraints.
Because of their importance for external identifiers, distinct values constraints are applied:
- to 93.5% of Properties with single value constraints,
- to 88.8% of Properties with format constraints, and
- to 81.8% of Properties with item requires statement constraints.
The distinct values constraint type and the format constraint type are complementary and closely related. Poorly defined format constraints allow several identifiers that refer to the same resource to be stored differently (for example, "123" and "0123") and prevent the corresponding distinct values constraints from detecting duplicates. In the opposite situation, for the Property of an external catalog completely matched on Wikidata, when a distinct values constraint is combined with a well-defined format constraint, both act as a virtual full constraint, which turns any addition or change into at least one violation.
conflicts-with
[edit]Goal: No conflicts-with constraint on Wikidata has exceptions. | |||
|
97% completed |
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "conflicts with" is for. | |||
|
86% completed |
The conflicts-with constraint type (see also: Item conflicts-with constraint (Q21502838) [talk]; help page [talk]; PHP class) specifies that entities with the constrained Property should not have certain other Properties or statements.
This is the second constraint type with the most checks (664,977,413, that is, 21.1% of checks) on Wikidata, only after the format constraint type. It can be applied to all Property types, although it is especially used for Properties of the ExternalId and WikibaseItem types. It has a high proportion of mandatory constraints (45%).
item requires statement
[edit]Goal: No item requires statement constraint on Wikidata has exceptions. | |||
|
94% completed |
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "item requires statement" is for. | |||
|
86% completed |
The item requires statement constraint type (see also: Item item-requires-statement constraint (Q21503247) [talk]; help page [talk]; PHP class), originally item, then item requires claim, specifies that entities with the constrained Property should also have certain other statements.
This is the constraint type with the most constraints (5,892, that is, 20.4% of constraints on Wikidata) and exceptions (3,747) defined.
It would be possible to change the name of this constraint type to subject requires statement or similar both to accommodate new Wikibase entity types other than Items and to avoid confusion with the value requires statement constraint type.
type
[edit]Goal: No type constraint on Wikidata has exceptions. | |||
|
95% completed |
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "type" is for. | |||
|
77% completed |
The type constraint type (see also: Item subject type constraint (Q21503250) [talk]; help page [talk]; PHP class) specifies that entities with the constrained Property should be instances or subclasses of other entities.
A high number of constraints (3,807) on almost all Property types and a high number of checks (290,281,292) belong to this constraint type, which shares space with all the other constraint types.
Concerning the relations that these constraints specify:
- 89.3% of type constraints require a relation of "instance of,"
- 7.3% require a relation of "subclass of," and
- the remaining 3.4% allow either of the two relations.
The name of this constraint type may be considered inaccurate or confusing for two reasons:
- rdf:type is an RDF property to state that a resource is an instance of a class, but the type constraint type is also used to specify that a resource should be a subclass of another class, for which rdfs:subClassOf is used instead;
- the word type is already used to refer to Property types (ExternalId, String, etc.), value types, entity types (Item, Lexeme, etc.) and, most confusingly, constraint types.
The name subject class is proposed, since this constraint type is used to specify the class of which the subject entity is an instance or a subclass. Likewise, for the current value type constraint type, the name value class is proposed.
allowed qualifiers
[edit]Goal: No allowed qualifiers constraint on Wikidata has exceptions. | |||
|
98% completed |
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "allowed qualifiers" is for. | |||
|
86% completed |
The allowed qualifiers constraint type (see also: Item allowed qualifiers constraint (Q21510851) [talk]; help page [talk]; PHP class) specifies that statements for the constrained Property should not have any qualifiers other than the listed ones.
Commons link
[edit]Goal: No Commons link constraint on Wikidata has exceptions. | |||
|
100% completed |
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "Commons link" is for. | |||
|
64% completed |
The Commons link constraint type (see also: Item Commons link constraint (Q21510852) [talk]; help page [talk]; PHP class) specifies that values for the constrained Property should be valid names of existing pages on Wikimedia Commons within a certain namespace.
56 (86%) constraints specify the File namespace; 4 (6%), the Data namespace; 2 (3%), the Category namespace; 1, the Creator namespace; 1, the Institution namespace, and one remaining constraint does not specify any namespace.
This is one of only two types whose constraints have the mandatory severity level (52, 80%) more often than the normal severity level (13, 20%). Consistently, it has no exceptions and no constraints with the suggestion level. Widely applicable constraint types without exceptions, with a high proportion of mandatory constraints and with a clear and controlled set of parameters should be considered good candidates for becoming default Wikibase features.
difference within range
[edit]Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "difference within range" is for. | |||
|
51% completed |
The difference within range constraint type (see also: Item difference-within-range constraint (Q21510854) [talk]; help page [talk]; PHP class) specifies that the difference between the values for two Properties should be within a certain range or interval.
This constraint type only has 7 constraints defined for 7 Properties: date of death (P570), dissolved, abolished or demolished date (P576), service retirement (P730), date of disappearance (P746), date of baptism (P1636), time in space (P2873) and date of burial or cremation (P4602).
inverse
[edit]Goal: No inverse constraint on Wikidata has exceptions. | |||
|
96% completed |
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "inverse" is for. | |||
|
76% completed |
The inverse constraint type (see also: Item inverse constraint (Q21510855) [talk]; help page [talk]; PHP class) specifies that the constrained Property has an inverse Property, and values for the constrained Property should have a statement with the inverse Property pointing back to the original entity.
mandatory qualifier
[edit]Goal: No mandatory qualifier constraint on Wikidata has exceptions. | |||
|
99% completed |
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "mandatory qualifier" is for. | |||
|
86% completed |
The mandatory qualifier constraint type (see also: Item required qualifier constraint (Q21510856) [talk]; help page [talk]; PHP class) specifies that a certain qualifier is mandatory for the constrained Property.
multi-value
[edit]Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "multi-value" is for. | |||
|
72% completed |
The multi-value constraint type (see also: Item multi-value constraint (Q21510857) [talk]; help page [talk]; PHP class) specifies that entities should not have only one statement with the constrained Property.
Only 22 constraints on Wikidata belong to this constraint type, 14 of them applied to Properties of the WikibaseItem type.
one-of
[edit]Goal: No one-of constraint on Wikidata has exceptions. | |||
|
99% completed |
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "one-of" is for. | |||
|
64% completed |
The one-of constraint type (see also: Item one-of constraint (Q21510859) [talk]; help page [talk]; PHP class) specifies that only certain values are allowed for a Property. It is limited to Properties of the WikibaseItem type. It would be possible to broaden its scope, although this could result in an overlap, and possible inconsistencies, with the format constraint type.
range
[edit]Goal: No range constraint on Wikidata lacks a lower bound. | |||
|
100% completed |
Goal: No range constraint on Wikidata has exceptions. | |||
|
91% completed |
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "range" is for. | |||
|
65% completed |
The range constraint type (see also: Item range constraint (Q21510860) [talk]; help page [talk]; PHP class) specifies that values for the constrained Property should be within a certain range or interval.
Range constraints are applied to 74.2% of Properties with integer constraints and 72.6% of Properties with no bounds constraints.
symmetric
[edit]Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "symmetric" is for. | |||
|
62% completed |
The symmetric constraint type (see also: Item symmetric constraint (Q21510862) [talk]; help page [talk]; PHP class) specifies that a Property is symmetric, and values for that Property should have a statement with the same Property pointing back to the original item
value requires statement
[edit]Goal: No value requires statement constraint on Wikidata has exceptions. | |||
|
95% completed |
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "value requires statement" is for. | |||
|
67% completed |
The value requires statement constraint type (see also: Item value-requires-statement constraint (Q21510864) [talk]; help page [talk]; PHP class), originally target requires claim or target required claim, specifies that values for the constrained Property should have a certain other statement.
This constraint type can be considered a more general, but less powerful, set of rules (superclass) of other constraint types, such as the inverse, symmetric and value type constraint types, which also require a statement on the value entity.
value type
[edit]Goal: All Wikidata Properties of the WikibaseItem type have a value type constraint. | |||
|
61% completed |
Goal: No value type constraint on Wikidata has exceptions. | |||
|
95% completed |
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "value type" is for. | |||
|
77% completed |
The value type constraint type (see also: Item value-type constraint (Q21510865) [talk]; help page [talk]; PHP class) specifies that values of the constrained Property should be instances or subclasses of a given entity. Since all Wikidata entities should be specified at least as instances or subclasses, and since the value type constraint type allows to specify arbitrarily general entities as values, all the Properties of the WikibaseItem type could have a value type constraint, although this is only the case for 61% of them.
Value type constraints are applied:
- to 89.6% of Properties with contemporary constraints,
- to 80.8% of Properties with none of constraints, and
- to 77.8% of Properties with value requires statement constraints.
Concerning the relations that these constraints specify:
- 77.4% of value type constraints require a relation of "instance of,"
- 11.4% require a relation of "subclass of," and
- the remaining 11.2% allow either of the two relations.
The name of this constraint type may be considered inaccurate or confusing for two reasons:
- rdf:type is an RDF property to state that a resource is an instance of a class, but the value type constraint type is also used to specify that a resource should be a subclass of another class, for which rdfs:subClassOf is used instead;
- the word type is already used to refer to Property types (ExternalId, String, etc.), value types, entity types (Item, Lexeme, etc.) and, most confusingly, constraint types.
The name value class is proposed, since this constraint type is used to specify the class of which the value is an instance or a subclass. Likewise, for the current type constraint type, the name subject class is proposed.
allowed units
[edit]Goal: All Wikidata Properties of the Quantity type have an allowed units constraint. | |||
|
80% completed |
Goal: No allowed units constraint on Wikidata has exceptions. | |||
|
99% completed |
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "allowed units" is for. | |||
|
88% completed |
The allowed units constraint type (see also: Item allowed units constraint (Q21514353) [talk]; help page [talk]; PHP class) specifies that only a certain set of units may be used with the constrained Property. It can only be applied to Properties of the Quantity type. 80% of these Properties have allowed units constraints, but ideally all of them should; values expressed in unexpected units whose conversion is impossible or unknown are not only useless for exploitation, but they can also easily go unnoticed and produce misinformation if other units are taken for granted by software agents or data reusers.
Allowed units constraints are applied:
- to 84.9% of Properties with no bounds constraints,
- to 83.5% of Properties with range constraints, and
- to 82.6% of Properties with integer constraints.
The most frequently specified units are, after the absence of unit (158 Properties):
- metre (Q11573) (28 Properties),
- United States dollar (Q4917) (28 Properties),
- kilometre (Q828224) (23 Properties),
- foot (Q3710) (20 Properties),
- percent (Q11229) (20 Properties),
- Euro (Q4916) (19 Properties),
- day (Q573) (18 Properties),
- kelvin (Q11579) (18 Properties),
- degree Celsius (Q25267) (17 Properties), and
- year (Q577) (17 Properties).
To make this constraint type more correctly used and easier to maintain, it is suggested that implementations recognize the definition of classes of units (e.g., the class unit of length), usually better than trying to list most or all the instances of such a class (e.g., the instances light-year, astronomical unit, foot, parsec, ångström, metre, centimetre, millimetre, and potentially hundreds of other instances). Once this implementation is finished, it is suggested to transform at least the longest lists of instances into their corresponding classes.
The Wikidata Properties with the longest, but not necessarily complete, lists of allowed units are:
- conversion to SI unit (P2370) (84 explicit units),
- length (P2043) (41 explicit units),
- bore (P2556) (39 explicit units),
- cost (P2130) (37 explicit units),
- total revenue (P2139) (36 explicit units),
- price (P2284) (35 explicit units),
- total assets (P2403) (26 explicit units),
- endowment (P6589) (24 explicit units),
- budget (P2769) (23 explicit units),
- net profit (P2295) (23 explicit units),
- concentration (P6274) (23 explicit units), and
- mass (P2067) (20 explicit units).
Modern constraint types
[edit]This section presents information on the current constraint types that were first implemented between 2018 and 2019, that is, when the use of templates to define constraints was already obsolete, and after a time period of more than two years without new constraint types. Most of these constraint types are not as well known among active editors as the constraint types above.
In addition to these constraint types, there exist the Lexeme requires lexical category, the Lexeme requires language and the Lexeme value requires lexical category constraint types, which are outside the scope of this study.
contemporary
[edit]Goal: No contemporary constraint on Wikidata has exceptions. | |||
|
92% completed |
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "contemporary" is for. | |||
|
48% completed |
The contemporary constraint type (see also: Item contemporary constraint (Q25796498) [talk]; help page [talk]; PHP class) specifies that the subject and value entities linked through the constrained Property should coexist at some point in history. It is only applicable to Properties of the WikibaseItem type.
no bounds
[edit]Goal: No no bounds constraint on Wikidata has exceptions. | |||
|
97% completed |
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "no bounds" is for. | |||
|
26% completed |
The no bounds constraint type (see also: Item no-bounds constraint (Q51723761) [talk]; help page [talk]; PHP class) specifies that the value of the Property, which must be of type Quantity, should not be used with upper or lower bounds.
The no bounds constraint type serves its purpose, although it is the least known constraint type and one of the constraint types with the least descriptive names according to active editors. Consistently, it is the constraint type with the lowest number of checks (659,132).
allowed entity types
[edit]Goal: All Wikidata Properties specify the entity types to which they apply. | |||
|
1% completed |
Goal: No allowed entity types constraint on Wikidata has exceptions. | |||
|
100% completed |
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "allowed entity types" is for. | |||
|
64% completed |
The allowed entity types constraint type (see also: Item allowed-entity-types constraint (Q52004125) [talk]; help page [talk]; PHP class) specifies the Wikibase entity types (Items, Lexemes, Properties…) where the constrained Property should be used. Ideally, all Properties could have this information specified, but this only happens for 1% of Wikidata Properties.
This constraint type is the third with the highest proportion of mandatory constraints (48%), only after the Commons link and Property scope constraint types. Consistently, it has no constraints with the suggestion level and no exceptions. Widely applicable constraint types without exceptions, with a high proportion of mandatory constraints and with a clear and controlled set of parameters should be considered good candidates for becoming default Wikibase features.
single best value
[edit]Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "single best value" is for. | |||
|
53% completed |
The single best value constraint type (see also: Item single-best-value constraint (Q52060874) [talk]; help page [talk]; PHP class) specifies that a Property should have a single "best" value. It may have any number of values, but exactly one of them, the "best" one by whatever criteria, should have preferred rank.
none of
[edit]Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "none of" is for. | |||
|
67% completed |
The none of constraint type (see also: Item none-of constraint (Q52558054) [talk]; help page [talk]; PHP class) specifies that certain values are not allowed for the constrained Property.
This constraint type only applies to Properties of the WikibaseItem type.
integer
[edit]Goal: No integer constraint on Wikidata has exceptions. | |||
|
98% completed |
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "integer" is for. | |||
|
62% completed |
The integer constraint type (see also: Item integer constraint (Q52848401) [talk]; help page [talk]; PHP class) specifies that the value with the constrained Property should be an integer, a quantity without decimal places.
Integer constraints are applied to 72.6% of Properties with no bounds constraints.
Property scope
[edit]Goal: All Wikidata Properties have their scope (main value, qualifier, reference…) specified. | |||
|
22% completed |
Goal: No Property scope constraint on Wikidata has exceptions. | |||
|
99% completed |
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "Property scope" is for. | |||
|
36% completed |
The Property scope constraint type (see also: Item property scope constraint (Q53869507) [talk]; help page [talk]; PHP class) specifies whether or not the constrained Property should be used for the main value of a statement, in a reference or as a qualifier. It applies to all Property types. Ideally, the scope of all Properties should be specified and this constraint type should be used for that; however, only 22% of Wikidata Properties have a Property scope constraint.
This is one of only two types whose constraints have the mandatory severity level (1,221, 78%) more often than the normal severity level (349, 22%). Consistently, it has no constraints with the suggestion level and is one of the constraint types with the fewest exceptions. Widely applicable constraint types without exceptions, with a high proportion of mandatory constraints and with a clear and controlled set of parameters should be considered good candidates for becoming default Wikibase features.
The Property scope constraint type is the second least known constraint type and one of the constraint types with the least descriptive names according to active editors. Its purpose can easily be confused with those of type or allowed entity types constraint types.
citation needed
[edit]Goal: No citation needed constraint on Wikidata has exceptions. | |||
|
97% completed |
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "citation needed" is for. | |||
|
88% completed |
The citation needed constraint type (see also: Item citation-needed constraint (Q54554025) [talk]; help page [talk]; PHP class) specifies that statements for a certain Property should have at least one reference.
Having imported its name from the Wikipedia culture, it is one of the best understood constraint types despite its recent introduction.
Complex constraints
[edit]The so-called complex constraints, which are based on templates, are not so distinguished by their complexity as by the need to specify them from scratch as SPARQL queries. The word "complex" has misleading connotations that may discourage work with these constraints. It is suggested to change the name of these template-based constraints to custom, a name that is already used on some pages.
Suggested constraint types
[edit]When active editors were asked whether it would be better to develop new constraint types or to improve existing features:
- 47% said both things were equally important,
- 31% said it would be better to improve existing features,
- 17% did not answer, and
- only 5% said it would better to focus on developing new constraint types.
The names of the proposed constraint types listed below are not necessarily the most convenient or the original ones.
acyclic
[edit]The constraint type most demanded by active editors (40 supporters, 69%) by a wide margin (12 supporters away from the second most demanded constraint type) would be the one that would allow checking "whether a Property causes impossible cycles (e.g., A is B's mother, B is C's mother, C is A's mother)."
This constraint type was proposed with the tentative name acyclic as the task phab:T173771 in 2017. Among other Properties, this constraint type would apply to the Property subclass of (P279) and therefore help to improve the Wikidata class hierarchy.
Other suggestions
[edit]- unused (28 supporters, 48%): "Check whether an obsolete Property is used." This constraint type was proposed as the task phab:T214244.
- sibling reference (22 supporters, 38%): "Check whether a Property used as a reference lacks a certain other Property as a reference (e.g., 'reference URL' lacks 'retrieved' with the access date)." This constraint type was proposed as the task phab:T229178.
- type of snak (20 supporters, 34%): "Check whether a Property is used with novalue or somevalue but should not." This constraint type was proposed as the task phab:T172129.
- label in language (18 supporters, 31%): "Check whether the Item with a Property does not have a label in a required language." This constraint type was proposed as the task phab:T195178.
- self-link (17 supporters, 29%): "Check whether a Property wrongly defines a self-link (the subject Item and the value Item are the same but should not be)." This constraint type was proposed as the task phab:T224837. It could become less useful with the presence of the acyclic constraint type.
- geographic precision (13 supporters, 22%): "Check whether a Property has a geographic location with a precision lower or higher than required."
- description in language (12 supporters, 21%): "Check whether the Item with a Property does not have a description in the required language." This constraint type was proposed as the task phab:T195179.
- time precision (10 supporters, 17%): "Check whether a Property has a date with a precision lower or higher than required."
- minimum number of statements (10 supporters, 17%): "Check whether the Items with a Property have fewer statements (regardless of the properties) than required." This constraint type was proposed as the task phab:T195181.
- calendar model (10 supporters, 17%): "Check whether a Property specifies dates with calendar models other than those required (e.g., other than the Gregorian calendar)."
- single value per language (10 supporters, 17%): "Check whether a Property has more than one value (text string) for each language but it should not." This constraint type was proposed as the task phab:T213967.
- identical values (7 supporters, 12%): "Check whether different properties that should be used with the same values are not identical." This constraint type was proposed as the task phab:T191963. This would be a special case of a general derivative statement constraint type.
- globe (6 supporters, 10%): "Check whether a Property has the geographic location of a celestial body other than the required one (e.g., other than Earth)."
- number of values (6 supporters, 10%): "Check whether a Property has fewer or more statements than required on the same Item (customized minimum and maximum)." This constraint type was proposed as the task phab:T172134.
Acknowledgements
[edit]This work has been made possible by a Rapid grant from the Wikimedia Foundation. You are invited to participate in the program.
Thanks to the Wikimedia Foundation, to Wikimedia Deutschland, to the 58 active editors who completed the survey on Property constraints and to all those who at some point have dedicated their time and effort to improving the quality of Wikidata data.
In memory of Amrapali Zaveri (Q34315853) (1984–2020).