Data Quality Rule

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2023/10/27 19:04 1/6 Data quality rule

Data quality rule

Definition

A data quality rule is a rule used to validate data values or data records.

Synonyms

Data validation rule


Data quality rule specification

Purpose

The purpose of a data quality rule is to prevent, monitor or report insufficient quality of data values
and data records.

Life cycle

Phase Activity
* Derive data quality rules from business rules
* Collect/identify data quality rules
Plan * Specify data quality rules
* Document data quality rules as metadata
* Establish data quality rules
Do * Implement/apply data quality rules in a database or application.
* Test data quality rules
Check
* Evaluate data quality rules
Act * Adapt/maintain data quality rules

Characteristics and requirements

Characteristic Requirement
Data quality rules are accessible to data users. It ensures that data users can ask
Accessibility
questions about it and provide feedback on rules.
Data quality rules are unambiguous. It ensures that they can be implemented in a
Unambiguity
database or application.
Maintainability Data quality can efficiently being maintained.

Relations

Data quality rule is child of metadata


Data quality rule is child of rule
Data quality rule is element of a data quality management system
Data quality rule is associated with specific data quality dimensions
Data quality rule can be applied to data values
Data quality rule can be applied to accuracy

Data Management Wiki - https://datamanagement.wiki/


Last
update:
data_quality_management_system:data_quality_rule https://datamanagement.wiki/data_quality_management_system/data_quality_rule
2023/10/17
15:55

Data quality rule can be applied to completeness


Data quality rule can be applied to consistency
Data quality rule can be applied to data records
Data quality rule can be applied to uniqueness
Data quality rule can be integrated in a database
Data quality rule can be integreted in an application
Data quality rule is derived from a business rule
Data quality rule prevents data issues
Data quality rule is applied firstly to critical data elements
Data quality rule is needed for data quality monitoring
Data quality rule is needed for data cleansing

Classification A

In table 1 Data quality rules are classified in three categories, and their subcategories.

Table 1: Category, subcategories, and examples.

Category and subcategory Example


Simple data element content rules.
These are considered “simple” because you only need to inspect the contents of a single
data element and check to see if the content meets the rules.
Valid values, range, data type, pattern,
and domain.
Optional versus mandatory (evaluates
completeness).
In a customer database, you would expect a fairly even
Reasonable distribution of values. distribution of birthdays; a much larger number of birthdays
on a given day of the year probably indicates a problem.
Cross data element validation rules.
The rules require inspecting values in multiple data elements (typically in a single data
file) to determine whether the data meets the quality rules.

https://datamanagement.wiki/ Printed on 2023/10/27 19:04


2023/10/27 19:04 3/6 Data quality rule

Category and subcategory Example


An overall list of location codes might pass the simple-column
Valid values that depend on other
content rules, but only a smaller list of locations is valid if, for
column values
example, the region code is set to “West.”
The Value of Collateral field may be optional, but if the loan
Optional becomes mandatory when
type is “mortgage,” a positive value must be filled into the
other columns contain certain data.
Value of Collateral field.
The Writing Insurance Agent Name field might normally be
Mandatory becomes null when other mandatory, but if the Origination Point is “web” (indicating
columns contain certain data. the customer applied for the policy online), the Writing
Insurance Agent Name field must then be blank.
Cross-table validation rule. As the
name suggests, these Data quality An example cross-validates the name of a city with the name
rules check columns (and of a state in an address table—that is, Minneapolis is not in
combinations of columns) across Wisconsin.
tables.
Cross data files validation rules.
As the name suggests, these Data quality rules check data elements (and combinations of
data elements) across data files.
If an account must have a customer, then the account table
Mandatory presence of foreign-key
must have value in the Customer ID column that matches a
relationships
value in the Customer ID column of the Customer table.
If the Loan Type is “mortgage” in the Loan table, there must
be a matching value for Loan_ID in the Collateral table. On
Optional presence of foreign-key the other hand, if the Loan_Type is “unsecured,” then there
relationships depending on other data must not be a matching value for Loan_ID in the Collateral
table because “unsecured” means there is no collateral for
the loan.
If the Collateral_Value column contains a value above a
Columns in different tables are
certain level, the Appraisal_Type must be “in person”
consistent
because of the high value of the property.

Classification B

In Table 2 Data quality rules are classified in ten categories.

Table 2: Category of Data quality rule, description, and example

Category Description Example


A domain list rule defines a list of
Domain List values that a data element is allowed The Gender data element can have 'M' or 'F'.
to have.
A domain pattern list rule defines a
list of patterns that a data element is An example pattern for a telephone number
Domain Pattern
allowed to conform to. The patterns is as follows: (^[[::space]]*[0-9]{ 3
List
are defined in the regular expression }[[::punct|:space:]]?[0-9]{ 4 }[[::space]]*$)
syntax.
A domain range rule defines a range
The value of the salary data element can be
Domain Range of values that a data element is
between 100 and 10000.
allowed to have.

Data Management Wiki - https://datamanagement.wiki/


Last
update:
data_quality_management_system:data_quality_rule https://datamanagement.wiki/data_quality_management_system/data_quality_rule
2023/10/17
15:55

Category Description Example


A common format rule defines a This rule type has many subtypes: Telephone
Common Format
known common format that a data Number, IP Address, SSN, URL, E-mail
/ Pattern
element is allowed to conform to. Address.
The department_id data element for an
A no nulls rule specifies that the data
No Nulls employee in the Employees table cannot be
element cannot have null values
null.
A functional dependency defines that
Functional
the data in the data object may be
Dependency
normalized or derived
A unique key data rule defines
whether a data element or group of
Unique Key The name of a department should be unique.
data elements are unique in the
given data object.
The department_id data element of the
A referential data rule defines the
Departments table should have a 1:n
Referential type of a relationship (1:x) a value
relationship with the department_id data
must have to another value.
element of the Employees table.
A name and address data rule
Name and
evaluate a group of data elements as
address
a name or address
VALID_DATE with two input parameters,
A custom data rule applies a SQL
START_DATE and END_DATE. A valid
Custom expression that you specify to its
expression for this rule is:
input parameters.
“THIS”.“END_DATE” > “THIS”.“START_DATE

Examples A

Table 3 shows how a Data quality rule is derived from a business rule.

Table 3: Example of a Data quality rules derived from Business Rule

Business Rule Data quality rule


The marital status code may have values of single, married,
widowed, and divorced. It may not be left blank. A value must be “Customer.Mar_Stat_Cd” may be
picked when entering a new customer. The values for widowed “S,” “M,” “W,” or “D.”
and divorced are tracked separately from single because risk Blank is considered an invalid
factors are sensitive to whether the customer was previously value.
married and is not married anymore.

Examples B

Table 4 shows Data quality rules of two fields.

Table 4: Example of Data quality rules

https://datamanagement.wiki/ Printed on 2023/10/27 19:04


2023/10/27 19:04 5/6 Data quality rule

Data element Data quality rules


* An email must contain the ‘@’ sign.
* ‘@’ must be used only once.
Email
* An email must contain any or all of the following: letters, digits, non-alphabetic
characters, such as, ! # $ % & ‘ * + – / = ?
* The ‘Customer’s full name’ field an email refers to must not be ‘Null’.
* Customer’s full name must consist only of letters; no other characters allowed.
Customer name
* Only first letters in customer name, middle name (if any), and surname must be
capitalized.

Tips

1. Involve subject matter experts from various department.


2. Be moderate with the number of rules. Define rules that really matter according to the subject
matter expert. Don’t create rules which already are implemented in the sources system as input
check.
3. Favour a step-by-step approach.
4. Treat each field of the database individually and creates rules accordingly.
5. Decide between a centralized and local storage for your Data quality rules.

References

Data quality rule (2023). YouTube video. DAMA-NL.

DAMA (2017). DAMA-DMBOK. Data Management Body of Knowledge. 2nd Edition. Technics
PublicationsLlc. August 2017. DAMA Dictionary of Data Management.

Data migration checklist: Planner + template for effective data migration planning — Data migration
pro. (n.d.).

Data Migration Pro. https://www.datamigrationpro.com/data-quality-rules-management-repository

Data quality - Data rules. (2020, March 31). Datacadamia - Data and Co.
https://datacadamia.com/data/quality/data_rule

Data quality and business rules explained: Expert interview with Ronald G. Ross — Data quality pro.
(2020, February 29). Data Quality Pro.
https://www.dataqualitypro.com/blog/business-rules-for-data-quality-ronald-g-ross

Data quality rule. (n.d.). ScienceDirect.com | Science, health and medical journals, full text articles
and books. https://www.sciencedirect.com/topics/computer-science/data-quality-rule

Reeb, B. (n.d.). Data quality rules. IData Insights Blog. https://blog.idatainc.com/data-quality-rules

Tips to create effective Data quality rules. (2016, April 29). TDAN.com.
https://tdan.com/tips-to-create-effective-data-quality-rules/24525

What are Data quality rules? How do they look, and why are they needed in data management? Our
expert provides the answers! (2020, September 3). Data Quality Solutions: Data Quality Software &
DQaaS | CDQ. https://www.cdq.ch/data-sharing/data-quality-rules

Data Management Wiki - https://datamanagement.wiki/


Last
update:
data_quality_management_system:data_quality_rule https://datamanagement.wiki/data_quality_management_system/data_quality_rule
2023/10/17
15:55

All, DQMS

From:
https://datamanagement.wiki/ - Data Management Wiki

Permanent link:
https://datamanagement.wiki/data_quality_management_system/data_quality_rule

Last update: 2023/10/17 15:55

https://datamanagement.wiki/ Printed on 2023/10/27 19:04

You might also like